MineDojo
Introduces MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base.
MineDojo is an open-source framework that provides a simulation suite and knowledge base for building generalist agents that can solve a variety of open-ended tasks. It promotes research towards the goal of generally capable embodied agents.
Unified-IO
Performs a large variety of AI tasks spanning classical CV tasks, VL tasks to NLP tasks such as QA and paraphrasing by casting input and output into a sequence of discrete vocabulary tokens.
Unified-IO is a single unified model that can perform a large variety of AI tasks without task-specific fine-tuning. It achieves this unification by homogenizing every supported input and output into a sequence of discrete vocabulary tokens. Code and demos for Unified-IO are available for researchers to use.
Evolution through Large Models
Pursues the insight that LLMs trained to generate code can vastly improve the effectiveness of mutation operators applied to programs in genetic programming.
Evolution through Large Models (ELM) explores the implications of large language models (LLMs) trained to generate code in genetic programming. LLMs can approximate likely changes that humans would make, which can help bootstrap new models that can output appropriate artifacts for a given context in a domain without training data. This carries implications for open-endedness, deep learning, and reinforcement learning.
Bootstrapped Transformer for Offline Reinforcement Learning
Proposes a novel algorithm named Bootstrapped Transformer and leverages the learned model to self-generate more offline data to further boost the sequence model training. Significantly outperforms Trajectory Transformer on D4RL
Can improve business operations and workflows that involve reinforcement learning by addressing the limited dataset problem and improving performance of sequence generation models
Bridge-Tower: Building Bridges Between Encoders in Vision-Language Representation Learning
Introduces multiple bridge layers that build a connection between the top layers of uni-modal encoders and each layer of the cross-modal encoder. Achieves SotA performance on various downstream vision-language tasks after pre-trained with only 4M images
Can help businesses improve their natural language processing and computer vision operations with better vision-language representation learning