Language Model Cascades
Formalizes the paradigm of language model cadcades, which include scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use.
Provides insights into how compositions of multiple models can expand capabilities, especially in cases with control flow and dynamic structure, and the techniques required from probabilistic programming for this. Offers recommendations for implementing disparate model structures and inference strategies in a unified language.
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Conducts a systematic study of scaling behaviour of ten diverse model architectures such as Transformers, Switch Transformers, Universal Transformers, Dynamic convolutions, Performers, and recently proposed MLP-Mixers.
Highlights the importance of considering model architecture when performing scaling and how inductive bias affects scaling behaviour, with significant implications for how model architectures are evaluated in the community. Offers insights into upstream (pretraining) and downstream (transfer) influences.