Mega: Moving Average Equipped Gated Attention
Introduces Mega, a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism. Achieves significant improvements over other sequence models in a wide range of sequence modeling benchmarks.
Implement Mega in your sequence modeling tasks to improve performance and incorporate inductive bias of position-aware local dependencies.
A Generalist Neural Algorithmic Learner
Constructs a generalist neural algorithmic learner - a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. Improves average single-task performance by over 20% from prior art.
Implement a generalist neural algorithmic learner to execute a wide range of algorithms and improve performance in algorithmic tasks.