Reproducible scaling laws for contrastive language-image learning
Investigates scaling laws for CLIP with the public LAION dataset and the open-source OpenCLIP repository.
This research provides insights on scaling laws for contrastive language-image pre-training (CLIP) that can be useful in large-scale experiments. It also makes the evaluation workflow and all models available for reproducibility purposes, making scaling laws research more accessible. There are potential applications of CLIP in various industries such as image retrieval and classification.
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Finds that object-masking during training improves text-image alignment such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion according to human eval.
This study can be useful for businesses that require text-guided image editing, such as those in the creative industry. The use of object-masking during training can lead to better text-image alignment and thus more faithful edits to input text prompts. The benchmark system EditBench can also be used for qualitative and quantitative evaluation, providing a systematic approach that can be useful in the development of new image editing models.