Sun Jun 19 2022 - Top Trending AI Papers

MineDojo

Reinforcement learning

AI for gaming

embodied agents

Introduces MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base.

MineDojo is an open-source framework that provides a simulation suite and knowledge base for building generalist agents that can solve a variety of open-ended tasks. It promotes research towards the goal of generally capable embodied agents.

https://arxiv.org/pdf/2206.08853.pdf

https://arxiv.org/abs/2206.08853

https://minedojo.org/

https://twitter.com/arankomatsuzaki/status/1538693243876962304/video/1

Unified-IO

Transformer-based architecture

Computer vision

Natural language processing

vision and language fields

Performs a large variety of AI tasks spanning classical CV tasks, VL tasks to NLP tasks such as QA and paraphrasing by casting input and output into a sequence of discrete vocabulary tokens.

Unified-IO is a single unified model that can perform a large variety of AI tasks without task-specific fine-tuning. It achieves this unification by homogenizing every supported input and output into a sequence of discrete vocabulary tokens. Code and demos for Unified-IO are available for researchers to use.

https://unified-io.allenai.org/

https://arxiv.org/pdf/2206.08916.pdf

https://arxiv.org/abs/2206.08916

https://twitter.com/arankomatsuzaki/status/1538688980694147072/photo/1

Evolution through Large Models

Language models

Deep learning

genetic programming

Pursues the insight that LLMs trained to generate code can vastly improve the effectiveness of mutation operators applied to programs in genetic programming.

Evolution through Large Models (ELM) explores the implications of large language models (LLMs) trained to generate code in genetic programming. LLMs can approximate likely changes that humans would make, which can help bootstrap new models that can output appropriate artifacts for a given context in a domain without training data. This carries implications for open-endedness, deep learning, and reinforcement learning.

https://arxiv.org/pdf/2206.08896.pdf

https://arxiv.org/abs/2206.08896

https://twitter.com/arankomatsuzaki/status/1538691174554140672/photo/1

Bootstrapped Transformer for Offline Reinforcement Learning

Offline reinforcement learning

Machine learning

Reinforcement learning

Sequence generation

Proposes a novel algorithm named Bootstrapped Transformer and leverages the learned model to self-generate more offline data to further boost the sequence model training. Significantly outperforms Trajectory Transformer on D4RL

Can improve business operations and workflows that involve reinforcement learning by addressing the limited dataset problem and improving performance of sequence generation models

https://arxiv.org/pdf/2206.08569.pdf

https://arxiv.org/abs/2206.08569

https://twitter.com/arankomatsuzaki/status/1538695157394001921/photo/1

Bridge-Tower: Building Bridges Between Encoders in Vision-Language Representation Learning

Two-tower architecture

Machine learning

Vision-language representation learning

Natural language processing

Computer vision

Introduces multiple bridge layers that build a connection between the top layers of uni-modal encoders and each layer of the cross-modal encoder. Achieves SotA performance on various downstream vision-language tasks after pre-trained with only 4M images

Can help businesses improve their natural language processing and computer vision operations with better vision-language representation learning

https://arxiv.org/pdf/2206.08657.pdf

https://arxiv.org/abs/2206.08657

https://github.com/microsoft/BridgeTower

https://twitter.com/arankomatsuzaki/status/1538696602491138048/photo/1