How Much Data Are Augmentations Worth?
This post is a short review of the publication
TLDR
This publication is a research about data augmentation influence on the model training invarience and robustness, conducted by University of Maryland, College Park, and New York University.
Key Points
- Augmentation can improve model performance even more than adding new real data, if the augmentation is inconsistent with the test data distribution, which means it generates out-of-domain data.
- In various data regimes, different augmentations are beneficial; for example, when there is little data, aggressive augmentations like horizontal flipping are preferable, whereas more data favors cautious augmentations like vertical flipping.
- On lower scales of data size, augmentations preferred, while invariant neural network architectures overtake them in the large-sample realm. Even on invariances that seem unrelated to one another, augmentations can be advantageous.
- Across neural network widths and topologies, relative increases through augmentations as sample sizes increase are generally stable, although absolute benefits depend on the architecture.