Erasing Concepts from Diffusion Models
Proposes a fine-tuning method to permanently remove specific visual concepts from a pre-trained diffusion model, benchmarked against previous approaches and demonstrating effectiveness.
Can improve the safety and ethical considerations for the use of diffusion models in generating explicit or realistic artwork.
Resurrecting Recurrent Neural Networks for Long Sequences
Demonstrates that careful design of deep RNNs can perform on par with SSMs on long-range reasoning tasks with comparable speed.
Can improve the performance and speed of RNNs on long sequences modeling tasks.
Self-planning Code Generation with Large Language Model
Introduces planning into code generation with large language models to reduce the difficulty of problem-solving and improve performance.
Can improve the performance of code generation tasks by introducing planning to better understand complex intent.
Transformer-based World Models Are Happy With 100k Interactions
Outperforms previous model-free and model-based RL algorithms on the Atari 100k benchmark.
Provides a new approach for building a sample-efficient world model using a transformer that allows it to access previous states and learn long-term dependencies while staying computationally efficient. Can be used to train policies that outperform previous RL algorithms on Atari 100k benchmark.
High-throughput Generative Inference of Large Language Models with a Single GPU
Presents FlexGen, a high-throughput generation engine for running LLMs with limited GPU memory
Introduces FlexGen which is a high-throughput generation engine that allows running large language models (LLMs) using limited resources such as a single GPU. FlexGen can be configured under various hardware constraints to store and access tensors efficiently, compress weights and attention cache to 4 bits with negligible accuracy loss. This allows FlexGen to achieve significantly higher throughput compared to state-of-the-art offloading systems when running OPT-175B on a single 16GB GPU, reaching a generation throughput of 1 token/s with an effective batch size of 144. Can be beneficial for back-of-house tasks such as benchmarking, information extraction, data wrangling, and form processing.