Sun Sep 25 2022
Thu Sep 22 2022

Mega: Moving Average Equipped Gated Attention

Natural language processing
Attention mechanism
Sequence modeling
Sequence modeling

Introduces Mega, a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism. Achieves significant improvements over other sequence models in a wide range of sequence modeling benchmarks.

Implement Mega in your sequence modeling tasks to improve performance and incorporate inductive bias of position-aware local dependencies.

A Generalist Neural Algorithmic Learner

Artificial intelligence
Graph neural network
Algorithmic reasoning
Algorithmic reasoning
Multi-task learning

Constructs a generalist neural algorithmic learner - a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. Improves average single-task performance by over 20% from prior art.

Implement a generalist neural algorithmic learner to execute a wide range of algorithms and improve performance in algorithmic tasks.

Sun Sep 18 2022
Wed Sep 14 2022
Sun Sep 11 2022
Thu Sep 01 2022