Sun Feb 05 2023 - Top Trending AI Papers

PyGlove: Efficiently Exchanging ML Ideas as Code

Software Engineering

Machine Learning

Collaborative ML

Efficient code sharing

PyGlove represents ideas as symbolic rule-based patches, enabling researchers to write down the rules for models they have not seen.

The PyGlove library allows for efficient collaboration among multiple teams in machine learning by representing ideas as symbolic rule-based patches. This allows researchers to quickly surmount the cost of adopting PyGlove by writing less code quicker, resulting in an 80% reduction in lines of code in some cases.

https://arxiv.org/pdf/2302.01918.pdf

https://arxiv.org/abs/2302.01918

https://twitter.com/arankomatsuzaki/status/1622407421611188225/photo/1

The unreasonable effectiveness of few-shot learning for machine translation

Deep Learning

Natural Language Processing

Machine Translation

Outperforms the best performing system on the WMT’21 English−Chinese news translation task by only using five examples of English−Chinese parallel data at inference.

Few-shot translation systems, trained with unpaired language data, show potential for both high and low-resource language pairs. They can match specialized supervised state-of-the-art models as well as commercial translation systems with only a few examples of high-quality translation data shown at inference. The approach does not require joint multilingual training or back-translation and shows potential for extension to the multilingual setting.

https://arxiv.org/pdf/2302.01398.pdf

https://arxiv.org/abs/2302.01398

https://twitter.com/arankomatsuzaki/status/1622431976777973760/photo/1

Has anyone tried next token prediction with truncated BPTT + label prediction at the end for a Long Range Arena like task?

Deep Learning

Machine Learning

Recurrent Neural Networks

Long Sequences

This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective.

The proposed method improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective, making truncated backpropagation feasible for long sequences and also improving full BPTT. It shows good performance and resource efficiency over competitive baselines, including other recurrent models and a comparable sized Transformer.

https://arxiv.org/pdf/1803.00144.pdf

https://arxiv.org/abs/1803.00144

https://twitter.com/arankomatsuzaki/status/1622243899686391812/photo/1

Creating a Large Language Model of a Philosopher

Philosophy

Natural Language Processing

Fine-tunes GPT-3 with the works of philosopher Daniel C. Dennett. While experts could distinguish generated statements from the real ones better than random, ordinary participants couldn't.

Can be used to generate text in the style of a specific philosopher or intellectual, which can be used in marketing or branding for businesses.

https://arxiv.org/ab/2302.01339.pdf

https://arxiv.org/ab/2302.01339

https://twitter.com/arankomatsuzaki/status/1622411101605445632/photo/1

Efficient Domain Adaptation for Speech Foundation Models

Machine Learning

Speech Recognition

Speech recognition systems

Achieves the same quality with only 21.6M supervised in-domain data and 130.8M finetuned parameters, compared to the 731.1M model trained from scratch on additional 300M supervised in-domain data.

Can be used to improve the accuracy of speech recognition systems, which can be valuable for businesses that rely on voice assistants and speech technology.

https://arxiv.org/pdf/2302.01496.pdf

https://arxiv.org/abs/2302.01496

https://twitter.com/arankomatsuzaki/status/1622409839656796160/photo/1