Wed Nov 23 2022 - Top Trending AI Papers

TorchScale: Transformers at Scale

PyTorch

Deep learning

Language modeling

Neural machine translation

Vision pretraining

Presenting an open-source toolkit for scaling up Transformers, allowing for improved modeling generality and capability, training stability and efficiency. Demonstrates successful scaling to different sizes in language modeling and neural machine translation.

Enables efficient and effective scaling of Transformers for language modeling and neural machine translation, improving modeling generality and capability, training stability and efficiency.

https://github.com/microsoft/torchscale

https://arxiv.org/pdf/2211.13184.pdf

https://arxiv.org/abs/2211.13184

https://twitter.com/arankomatsuzaki/status/1595593557389283333/photo/1

Self-Supervised Learning based on Heat Equation

Self-supervised learning

Computer vision

Image classification

Object detection

Proposes a self-supervised learning method, QB-Heat, based on heat equation extension into high dimensional feature space. QB-Heat enables simple masked image modeling for CNNs that works well for pre-training light-weight networks suitable for image classification and object detection.

Introduces QB-Heat, a simple self-supervised learning method that enables masked image modeling for CNNs that works well for pre-training light-weight networks suitable for image classification and object detection.

https://arxiv.org/pdf/2211.13228.pdf

https://arxiv.org/abs/2211.13228

https://twitter.com/arankomatsuzaki/status/1595595414405025792/photo/1

Retrieval-Augmented Multimodal Language Modeling

CLIP model

Transformer architecture

Multimodal learning

Natural language processing

Image generation

Caption generation

Introduces RA-CM3, a retrieval-augmented multimodal model that enables a base multimodal model to refer to relevant knowledge fetched by a retriever from external memory. RA-CM3 significantly outperforms baseline models on image and caption generation tasks while requiring less compute for training.

Presents RA-CM3, a retrieval-augmented multimodal model that outperforms baseline models on image and caption generation tasks while requiring less compute for training.

https://arxiv.org/pdf/2211.12561.pdf

https://arxiv.org/abs/2211.12561

https://twitter.com/arankomatsuzaki/status/1595602943134552064/photo/1

Masked Autoencoding for Scalable and Generalizable Decision Making

Sequential Data

Machine Learning

Artificial Intelligence

Reinforcement Learning

Behavioral Cloning

Zero-shot Transfer

Presents MaskDP, a self-supervised pretraining method for RL that outperforms GPT-like approaches.

Provides a scalable and generalizable decision-making process through self-supervised pretraining, with zero-shot transfer capability to new tasks and promising scaling behavior in offline RL.

https://arxiv.org/pdf/2211.12740.pdf

https://arxiv.org/abs/2211.12740

https://twitter.com/arankomatsuzaki/status/1595600190706163713/photo/1

Inversion-Based Creativity Transfer with Diffusion Models

Probabilistic Models

Computer Vision

Artificial Intelligence

Artistic Style Transfer

Image Synthesis

Painting Analysis

Learns artistic creativity directly from a single painting and then guide the synthesis without providing complex textual descriptions.

Enables transfer of artistic style from a single painting to guide synthesis without complex textual descriptions, improving arbitrary example-guided artistic image generation methods.

https://arxiv.org/pdf/2211.13203.pdf

https://arxiv.org/abs/2211.13203

https://twitter.com/arankomatsuzaki/status/1595592738652692480/photo/1