Data efficient foundation models

Foundation models are notorious for being very data hungry. Segment anything was trained with 1 billion masks. CLIP was trained with >100 million image captions.

However, it is possible to train useful foundation models with much fewer labels, using a number of very pragmatic tricks to utilize preexisting models and available data as effectively as possible. These tricks are powering a new generation of data-efficient foundation models that are bringing recent innovations from social platforms into the real world.

Key ideas include:

Examples include:

Clever bootstrapping and applications

Anomaly detection models

Negative data labeling:

In the wild labeling