Scaling Transformer to 1M tokens and beyond with RMT
By leveraging the Recurrent Memory Transformer architecture, they have successfully increased the model’s effective context length to an unprecedented two million tokens.
This research presents a method to enhance long-term dependency handling in natural language understanding and generation tasks as well as enable large-scale context processing for memory-intensive applications.
Inducing anxiety in large language models increases exploration and bias
Large language models are transforming research on machine learning while galvanizing public debates.
This research shows how the behavior of large language models changes when prompted with anxiety-inducing text, suggesting that prompt engineering can influence their behavior in applied settings. It also demonstrates the usefulness of methods taken from computational psychiatry for studying algorithms to which we increasingly delegate authority and autonomy.
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss.
This research presents a pre-training method that integrates textual information to enable semantic search and zero-shot classification for symbolic music, surpassing the capabilities of previous models. It also provides a dataset of 1010 lead sheets in ABC notation, each accompanied by a title, artist, genre, and description, and demonstrates comparable or superior performance on score-oriented datasets.
Fundamental Limitations of Alignment in Large Language Models
This paper highlights the limitations of alignment in large language models and proposes a theoretical approach called Behavior Expectation Bounds (BEB) to investigate these limitations. The authors prove that any alignment process that attenuates undesired behavior but does not remove it altogether is not safe against adversarial prompting attacks. They also found that behaviors that are generally unlikely to be exhibited by the model can be brought to the front by triggering the model to behave as a specific persona. This theoretical result is being experimentally demonstrated in large scale by the so-called contemporary 'chatGPT jailbreaks', where adversarial users trick the LLM into breaking its alignment guardrails by triggering it into acting as a malicious persona.
It is important for businesses to consider the limitations of alignment in large language models, especially when implementing them in customer-facing applications. BEB could be a useful theoretical framework to develop and test alignment mechanisms that are safe against adversarial prompting attacks. Businesses could also consider the use of personas to prompt the model towards behavior that aligns with their values and goals.
Factored Neural Representation for Scene Understanding
This paper introduces a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural representations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). The authors evaluated this approach against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory).
This factored neural scene representation could be useful for businesses that require scene understanding in their operations, such as in robotics, autonomous vehicles, and security monitoring. It could facilitate efficient and accurate tracking of objects in dynamic scenes with multiple moving and/or deforming objects. Its interpretability and editability could also be useful for quality control and error correction.