Sun Apr 23 2023 - Top Trending AI Papers

Scaling Transformer to 1M tokens and beyond with RMT

Transformer-based models

natural language processing

machine learning

natural language understanding

natural language generation

memory-intensive applications

By leveraging the Recurrent Memory Transformer architecture, they have successfully increased the model’s effective context length to an unprecedented two million tokens.

This research presents a method to enhance long-term dependency handling in natural language understanding and generation tasks as well as enable large-scale context processing for memory-intensive applications.

https://arxiv.org/pdf/2304.11062.pdf

https://arxiv.org/abs/2304.11062

https://twitter.com/arankomatsuzaki/status/1650299336096612353/photo/1

Inducing anxiety in large language models increases exploration and bias

Generative Pre-Trained Transformer 3.5

machine learning

computational psychiatry

prompt engineering

applied settings

algorithm behavior

Large language models are transforming research on machine learning while galvanizing public debates.

This research shows how the behavior of large language models changes when prompted with anxiety-inducing text, suggesting that prompt engineering can influence their behavior in applied settings. It also demonstrates the usefulness of methods taken from computational psychiatry for studying algorithms to which we increasingly delegate authority and autonomy.

https://arxiv.org/pdf/2304.11111.pdf

https://arxiv.org/abs/2304.11111

https://twitter.com/arankomatsuzaki/status/1650299035566346241/photo/1

CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

Contrastive Language-Music Pre-training

natural language processing

music information retrieval

semantic search

zero-shot classification

symbolic music

We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss.

This research presents a pre-training method that integrates textual information to enable semantic search and zero-shot classification for symbolic music, surpassing the capabilities of previous models. It also provides a dataset of 1010 lead sheets in ABC notation, each accompanied by a title, artist, genre, and description, and demonstrates comparable or superior performance on score-oriented datasets.

https://arxiv.org/pdf/2304.11029.pdf

https://arxiv.org/abs/2304.11029

https://twitter.com/_akhaliq/status/1650356716331053056/photo/1

Fundamental Limitations of Alignment in Large Language Models

language modeling

natural language processing

AI safety

customer service chatbots

automated content moderation

virtual assistants

This paper highlights the limitations of alignment in large language models and proposes a theoretical approach called Behavior Expectation Bounds (BEB) to investigate these limitations. The authors prove that any alignment process that attenuates undesired behavior but does not remove it altogether is not safe against adversarial prompting attacks. They also found that behaviors that are generally unlikely to be exhibited by the model can be brought to the front by triggering the model to behave as a specific persona. This theoretical result is being experimentally demonstrated in large scale by the so-called contemporary 'chatGPT jailbreaks', where adversarial users trick the LLM into breaking its alignment guardrails by triggering it into acting as a malicious persona.

It is important for businesses to consider the limitations of alignment in large language models, especially when implementing them in customer-facing applications. BEB could be a useful theoretical framework to develop and test alignment mechanisms that are safe against adversarial prompting attacks. Businesses could also consider the use of personas to prompt the model towards behavior that aligns with their values and goals.

https://arxiv.org/pdf/2304.11082.pdf

https://arxiv.org/abs/2304.11082

https://twitter.com/_akhaliq/status/1650302690109403139/photo/1

Factored Neural Representation for Scene Understanding

neural representations

computer vision

scene understanding

security monitoring

robotics

autonomous vehicles

This paper introduces a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural representations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). The authors evaluated this approach against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory).

This factored neural scene representation could be useful for businesses that require scene understanding in their operations, such as in robotics, autonomous vehicles, and security monitoring. It could facilitate efficient and accurate tracking of objects in dynamic scenes with multiple moving and/or deforming objects. Its interpretability and editability could also be useful for quality control and error correction.

https://arxiv.org/pdf/2304.10950.pdf

https://arxiv.org/abs/2304.10950

https://twitter.com/_akhaliq/status/1650342195071709184/photo/1