Mon Mar 13 2023 - Top Trending AI Papers

Erasing Concepts from Diffusion Models

Diffusion models

Computer Vision

Artificial Intelligence and Ethics

Artistic style removal

Ensuring safety and ethical considerations in artwork generation

Proposes a fine-tuning method to permanently remove specific visual concepts from a pre-trained diffusion model, benchmarked against previous approaches and demonstrating effectiveness.

Can improve the safety and ethical considerations for the use of diffusion models in generating explicit or realistic artwork.

https://erasing.baulab.info/

https://arxiv.org/pdf/2303.07345.pdf

https://arxiv.org/abs/2303.07345

https://twitter.com/arankomatsuzaki/status/1635443683720929280/photo/1

Resurrecting Recurrent Neural Networks for Long Sequences

State-space models

Deep Learning

Recurrent Neural Networks

Long-range reasoning tasks

Demonstrates that careful design of deep RNNs can perform on par with SSMs on long-range reasoning tasks with comparable speed.

Can improve the performance and speed of RNNs on long sequences modeling tasks.

https://arxiv.org/pdf/2303.06349.pdf

https://arxiv.org/abs/2303.06349

https://twitter.com/arankomatsuzaki/status/1635453248252391427/photo/1

Self-planning Code Generation with Large Language Model

Large Language Models

Natural Language Processing

Deep Learning

Code generation tasks

Introduces planning into code generation with large language models to reduce the difficulty of problem-solving and improve performance.

Can improve the performance of code generation tasks by introducing planning to better understand complex intent.

https://arxiv.org/pdf/2303.06689.pdf

https://arxiv.org/abs/2303.06689

https://twitter.com/arankomatsuzaki/status/1635455346167742468/photo/1

Transformer-based World Models Are Happy With 100k Interactions

Deep Neural Networks

Reinforcement Learning

Atari 100k benchmark

Outperforms previous model-free and model-based RL algorithms on the Atari 100k benchmark.

Provides a new approach for building a sample-efficient world model using a transformer that allows it to access previous states and learn long-term dependencies while staying computationally efficient. Can be used to train policies that outperform previous RL algorithms on Atari 100k benchmark.

https://arxiv.org/pdf/2303.07109.pdf

https://arxiv.org/abs/2303.07109

https://twitter.com/arankomatsuzaki/status/1635444598041370624/photo/1

High-throughput Generative Inference of Large Language Models with a Single GPU

Linear programming optimization

Natural Language Processing

OPT-175B

HELM benchmark

Presents FlexGen, a high-throughput generation engine for running LLMs with limited GPU memory

Introduces FlexGen which is a high-throughput generation engine that allows running large language models (LLMs) using limited resources such as a single GPU. FlexGen can be configured under various hardware constraints to store and access tensors efficiently, compress weights and attention cache to 4 bits with negligible accuracy loss. This allows FlexGen to achieve significantly higher throughput compared to state-of-the-art offloading systems when running OPT-175B on a single 16GB GPU, reaching a generation throughput of 1 token/s with an effective batch size of 144. Can be beneficial for back-of-house tasks such as benchmarking, information extraction, data wrangling, and form processing.

https://github.com/FMInference/FlexGen

https://arxiv.org/pdf/2303.06865.pdf

https://arxiv.org/abs/2303.06865

https://twitter.com/arankomatsuzaki/status/1635447394925006853/photo/1