Human-level Atari 200x faster
Agent57 was the first agent to surpass the human benchmark on all 57 games, but this came at the cost of poor data-efficiency, requiring nearly 80 billion frames of experience to achieve. Taking Agent57 as a starting point, the research employs a diverse set of strategies to achieve a 200-fold reduction of experience needed to outperform the human baseline.
This research proposes a more efficient method to build general agents that can perform well over a wide range of tasks. The proposed method requires 200 times less experience to outperform the human baseline, making it more data-efficient and robust. This can be valuable for businesses that use reinforcement learning to optimize their processes and workflows.
Text and Patterns: For Effective Chain of Thought It Takes Two to Tango
This work uses counterfactual prompting to develop a deeper understanding of CoT-based few-shot prompting mechanisms in large language models. Our empirical and qualitative analysis reveals that a symbiotic relationship between text and patterns explains the success of few-shot prompting: text helps extract commonsense from the question to help patterns, and patterns enforce task understanding and direct text generation.
The research identifies the key components of a prompt of CoT and devises a method called CCOT (Concise CoT), which delivers similar or slightly higher solve task rates. This can be useful for businesses that rely on natural language processing and few-shot techniques to augment their communication and decision-making processes.