PyGlove: Efficiently Exchanging ML Ideas as Code
PyGlove represents ideas as symbolic rule-based patches, enabling researchers to write down the rules for models they have not seen.
The PyGlove library allows for efficient collaboration among multiple teams in machine learning by representing ideas as symbolic rule-based patches. This allows researchers to quickly surmount the cost of adopting PyGlove by writing less code quicker, resulting in an 80% reduction in lines of code in some cases.
The unreasonable effectiveness of few-shot learning for machine translation
Outperforms the best performing system on the WMT’21 English−Chinese news translation task by only using five examples of English−Chinese parallel data at inference.
Few-shot translation systems, trained with unpaired language data, show potential for both high and low-resource language pairs. They can match specialized supervised state-of-the-art models as well as commercial translation systems with only a few examples of high-quality translation data shown at inference. The approach does not require joint multilingual training or back-translation and shows potential for extension to the multilingual setting.
Has anyone tried next token prediction with truncated BPTT + label prediction at the end for a Long Range Arena like task?
This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective.
The proposed method improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective, making truncated backpropagation feasible for long sequences and also improving full BPTT. It shows good performance and resource efficiency over competitive baselines, including other recurrent models and a comparable sized Transformer.
Creating a Large Language Model of a Philosopher
Fine-tunes GPT-3 with the works of philosopher Daniel C. Dennett. While experts could distinguish generated statements from the real ones better than random, ordinary participants couldn't.
Can be used to generate text in the style of a specific philosopher or intellectual, which can be used in marketing or branding for businesses.
Efficient Domain Adaptation for Speech Foundation Models
Achieves the same quality with only 21.6M supervised in-domain data and 130.8M finetuned parameters, compared to the 731.1M model trained from scratch on additional 300M supervised in-domain data.
Can be used to improve the accuracy of speech recognition systems, which can be valuable for businesses that rely on voice assistants and speech technology.