PaLI: A Jointly-Scaled Multilingual Language-Image Model
Trains a 17B ViT on a multilingual image-text dataset containing 10B images and texts in over 100 languages and achieves SotA perf in multiple vision and language tasks.
This joint scaling approach provides a simple, modular, and scalable design for improving vision, language, and multimodal tasks in multiple languages. It can benefit businesses with multilingual operations that require image-text analysis and processing.
Revisiting Neural Scaling Laws in Language and Vision
Presents a recipe for estimating scaling law parameters reliably from learning curves and shows that it extrapolates more accurately than previous methods in various settings.
This methodology for estimating scaling law parameters can help businesses determine the appropriate size of their neural models for specific tasks, resulting in more efficient and effective training.