Data augmentation transforms

This page lists data augmentation transforms used in DAN.

Individual augmentation transforms

Elastic Transform

Elastic Transform

Description

This transformation applies local distortions that rotate characters locally.

Comments

The impact of this transformation is mostly visible on documents, not so much on lines. Results are comparable to the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line elastic document elastic

CPU time (seconds/10 images)

0.44 (3013x128 pixels) / 0.86 (1116x581 pixels)

PieceWise Affine

This transform is temporarily removed from the pipeline until this issue is fixed.
PieceWise Affine

Description

This transformation also applies local distortions but with a larger grid than ElasticTransform.

Comments

This transformation is very slow. It is a new transform that was not in the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line piecewise document piecewise

CPU time (seconds/10 images)

2.92 (3013x128 pixels) / 3.76 (1116x581 pixels)

Dilation Erosion

Dilation & Erosion

Description

This transformation makes the pen stroke thicker or thinner.

Comments

The RandomDilationErosion class randomly selects a kernel size and applies a dilation or an erosion to the image. It relies on opencv and is similar to the original DAN implementation.

Documentation

See the opencv documentation

Examples

line erosion dilation document erosion dilation

CPU time (seconds/10 images)

0.02 (3013x128 pixels) / 0.03 (1116x581 pixels)

Sharpen

Sharpen

Description

This transformation makes the image sharper.

Comments

Similar to the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line sharpen document sharpen

CPU time (seconds/10 images)

0.02 (3013x128 pixels) / 0.04 (1116x581 pixels)

Color Jittering

Color Jittering

Description

This transformation alters the colors of the image.

Comments

Similar to the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line color jitter document color jitter

CPU time (seconds/10 images)

0.03 (3013x128 pixels) / 0.04 (1116x581 pixels)

Gaussian Noise

Gaussian Noise

Description

This transformation adds Gaussian noise to the image.

Comments

The noise from the original DAN implementation is more uniform.

Documentation

See the albumentations documentation

Examples

line gaussian noise document gaussian noise

CPU time (seconds/10 images)

0.29 (3013x128 pixels) / 0.53 (1116x581 pixels)

Gaussian Blur

Gaussian Blur

Description

This transformation blurs the image.

Comments

Similar to the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line gaussian blur document gaussian blur

CPU time (seconds/10 images)

0.01 (3013x128 pixels) / 0.02 (1116x581 pixels)

Random Perspective

Random Perspective

Description

This transformation changes the perspective from which the photo is taken.

Comments

Similar to the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line perspective document perspective

CPU time (seconds/10 images)

0.05 (3013x128 pixels) / 0.05 (1116x581 pixels)

Shearing (x-axis)

Shearing (x-axis)

Description

This transformation changes the slant of the text on the image.

Comments

New transform that was not in the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line shearx document shearx

CPU time (seconds/10 images)

0.05 (3013x128 pixels) / 0.04 (1116x581 pixels)

Coarse Dropout

Coarse Dropout

Description

This transformation adds dropout on the image, turning small patches into black pixels.

Comments

It is a new transform that was not in the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line dropout document dropout

CPU time (seconds/10 images)

0.02 (3013x128 pixels) / 0.02 (1116x581 pixels)

Random Scale

RandomScale

Description

This transformation downscales the image from a random factor.

Comments

The original DAN implementation reimplemented it as DPIAdjusting.

Documentation

See the albumentations documentation

Examples

line random scale document random scale

To Gray

ToGray

Description

This transformation transforms an RGB image into grayscale.

Comments

It is a new transform that was not in the original DAN implementation.

Documentation

See the albumentations documentation

Examples

line grayscale document grayscale

CPU time (seconds/10 images)

0.02 (3013x128 pixels) / 0.02 (1116x581 pixels)

Full augmentation pipeline

  • Data augmentation is applied with a probability of 0.9.

  • In this case, two transformations are randomly selected to be applied.

  • Reproducibility is possible by setting random.seed and np.random.seed (already done in dan/ocr/document/train.py)

  • Examples with new pipeline:

line full pipeline document full pipeline document full pipeline 2