TorchScale: Transformers at Scale
Presenting an open-source toolkit for scaling up Transformers, allowing for improved modeling generality and capability, training stability and efficiency. Demonstrates successful scaling to different sizes in language modeling and neural machine translation.
Enables efficient and effective scaling of Transformers for language modeling and neural machine translation, improving modeling generality and capability, training stability and efficiency.
Self-Supervised Learning based on Heat Equation
Proposes a self-supervised learning method, QB-Heat, based on heat equation extension into high dimensional feature space. QB-Heat enables simple masked image modeling for CNNs that works well for pre-training light-weight networks suitable for image classification and object detection.
Introduces QB-Heat, a simple self-supervised learning method that enables masked image modeling for CNNs that works well for pre-training light-weight networks suitable for image classification and object detection.
Retrieval-Augmented Multimodal Language Modeling
Introduces RA-CM3, a retrieval-augmented multimodal model that enables a base multimodal model to refer to relevant knowledge fetched by a retriever from external memory. RA-CM3 significantly outperforms baseline models on image and caption generation tasks while requiring less compute for training.
Presents RA-CM3, a retrieval-augmented multimodal model that outperforms baseline models on image and caption generation tasks while requiring less compute for training.
Masked Autoencoding for Scalable and Generalizable Decision Making
Presents MaskDP, a self-supervised pretraining method for RL that outperforms GPT-like approaches.
Provides a scalable and generalizable decision-making process through self-supervised pretraining, with zero-shot transfer capability to new tasks and promising scaling behavior in offline RL.
Inversion-Based Creativity Transfer with Diffusion Models
Learns artistic creativity directly from a single painting and then guide the synthesis without providing complex textual descriptions.
Enables transfer of artistic style from a single painting to guide synthesis without complex textual descriptions, improving arbitrary example-guided artistic image generation methods.