Tue Dec 20 2022
Mon Dec 19 2022

The case for 4-bit precision: k-bit Inference Scaling Laws

Deep learning
Machine learning
Natural Language Processing
Model compression
Quantization methods
Language models

Shows that 4-bit precision is almost universally optimal for total model bits and zero-shot accuracy.

Implementing quantization methods to reduce memory footprint and inference latencies while maintaining zero-shot accuracy.

Natural Language to Code Generation in Interactive Data Science Notebooks

Natural Language Processing
Artificial Intelligence
Programming
Automatic code generation
Data science notebooks
Pandas data analysis framework

Builds a benchmark and a language model for automatic code generation in data science notebooks.

Improving the accuracy and efficiency of AI pair programmers to synthesize code given natural language intents.

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Deep learning
Natural Language Processing
Machine learning
Pretrained language models
Instruction tuning
Automatically generated datasets

Demonstrates that training on a large dataset of automatically generated instructions leads to a model that outperforms models trained on manually-curated datasets.

Using model-generated data as a cost-effective alternative to crowdsourcing for dataset expansion and diversification.

DSI++: Updating Transformer Memory with New Documents

Natural Language Processing
Artificial Intelligence for Information Retrieval
Continual learning
Incremental indexing of new documents
Answering queries related to both previously and newly indexed documents
Continual indexing benchmarks based on Natural Questions and MS MARCO

Presents DSI++, a continual learning challenge for differential search indices to incrementally index new docs while being able to answer queries related to both previously and newly indexed docs.

Provides a solution for deploying differential search indices in situations where the corpus changes over time, mitigating forgetting by a significant margin and improving the average Hits@10 by over competitive baselines.

Scalable Diffusion Models with Transformers

Generative adversarial networks
Computer Vision
Deep Learning
Training latent diffusion models of images
Class-conditional ImageNet benchmarks

Explores a new class of diffusion models based on the transformer architecture, achieving a state-of-the-art FID of 2.27 on class-conditional ImageNet 256x256 benchmarks.

Offers a scalable and efficient solution for training latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches.

Sun Dec 18 2022
Thu Dec 15 2022
Wed Dec 14 2022
Tue Dec 13 2022