How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
Finds that without any input-dependent attention, all models achieve competitive perf w/ relative drop of only 8% from the probing baseline.
Simpler alternatives to input-dependent attention could be explored to improve the efficiency and effectiveness of pretrained language models.
Measuring Progress on Scalable Oversight for Large Language Models
Shows that present LLMs can help humans achieve difficult tasks in settings that are relevant to scalable oversight.
Large language models can be used to assist humans in difficult tasks, paving the way for better oversight and development of AI systems.
Learning Visual Locomotion with Cross-Modal Supervision
Can adapt to shifts in the visual field with less than 30 minutes of real-world data.
The use of Cross-Modal Supervision in visual walking policies can improve adaptability and performance with limited real-world data.