The case for 4-bit precision: k-bit Inference Scaling Laws
Shows that 4-bit precision is almost universally optimal for total model bits and zero-shot accuracy.
Implementing quantization methods to reduce memory footprint and inference latencies while maintaining zero-shot accuracy.
Natural Language to Code Generation in Interactive Data Science Notebooks
Builds a benchmark and a language model for automatic code generation in data science notebooks.
Improving the accuracy and efficiency of AI pair programmers to synthesize code given natural language intents.
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Demonstrates that training on a large dataset of automatically generated instructions leads to a model that outperforms models trained on manually-curated datasets.
Using model-generated data as a cost-effective alternative to crowdsourcing for dataset expansion and diversification.
DSI++: Updating Transformer Memory with New Documents
Presents DSI++, a continual learning challenge for differential search indices to incrementally index new docs while being able to answer queries related to both previously and newly indexed docs.
Provides a solution for deploying differential search indices in situations where the corpus changes over time, mitigating forgetting by a significant margin and improving the average Hits@10 by over competitive baselines.
Scalable Diffusion Models with Transformers
Explores a new class of diffusion models based on the transformer architecture, achieving a state-of-the-art FID of 2.27 on class-conditional ImageNet 256x256 benchmarks.
Offers a scalable and efficient solution for training latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches.