Wed Feb 01 2023 - Top Trending AI Papers

Large Language Models Can Be Easily Distracted by Irrelevant Context

Problem-solving

Natural Language Processing

Machine Learning

NLP

Language models

Data processing

Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant context.

This research highlights the importance of considering irrelevant context in evaluating the performance of large language models, and proposes approaches for mitigating this deficiency, such as decoding with self-consistency and adding to the prompt an instruction that tells the language model to ignore the irrelevant information.

https://arxiv.org/pdf/2302.00093.pdf

https://arxiv.org/abs/2302.00093

https://twitter.com/arankomatsuzaki/status/1620962014934827011/photo/1

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

Few-shot learning

Machine Learning

Machine learning

Data processing

Few-shot learning involves learning an effective model from only a few labeled datapoints. The use of a small training set makes it difficult to avoid overfitting but also makes few-shot learning applicable to many important real-world settings. In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization.

This research proposes automated sampling strategies for few-shot learning with auxiliary data and compares them with methods that either explore or exploit, finding that the combination of exploration and exploitation is crucial. The proposed algorithms yield a significant improvement over existing pre-trained models on 11 datasets.

https://arxiv.org/pdf/2302.00674.pdf

https://arxiv.org/abs/2302.00674

https://github.com/alon-albalak/FLAD

https://twitter.com/arankomatsuzaki/status/1620958059823792129/photo/1

CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

Code generation models

Artificial Intelligence

Software Development

Software development

Code generation

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To validate the performance of these models, multiple existing benchmarks (e.g., AiXBench and HumanEval) are proposed, including only cases of generating a standalone function, i.e., a function that invokes or accesses only built-in functions and standard libraries.

This research proposes a benchmark named CoderEval of pragmatic code generation with generative pre-trained models, which can assess the performance of models against pragmatic code generation beyond just generating standalone functions. The evaluation of three public available models on CoderEval provides insights into the progress and future directions of pragmatic code generation with a generative pre-trained model.

https://arxiv.org/pdf/2302.00288.pdf

https://arxiv.org/abs/2302.00288

https://twitter.com/arankomatsuzaki/status/1620960003158077440/photo/1

Learning Universal Policies via Text-Guided Video Generation

Sequential decision making

Artificial intelligence

Robotics

Robot manipulation tasks

Future frame synthesis

Text-guided video generation

Casts the sequential decision making problem as a text-conditioned video generation problem, where, a planner synthesizes a set of future frames, from which actions are extracted.

This research offers a new approach to constructing more general-purpose AI agents that can solve a wide variety of tasks. By leveraging text as the underlying goal specification, the proposed policy-as-video formulation offers combinatorial generalization to novel goals and can represent environments with different state and action spaces in a unified space of images. The approach enables knowledge transfer through predicting highly realistic video plans for real robots.

https://arxiv.org/pdf/2302.00111.pdf

https://arxiv.org/abs/2302.00111

https://twitter.com/arankomatsuzaki/status/1620961541171613696/photo/1