Thu Apr 27 2023 - Top Trending AI Papers

We’re Afraid Language Models Aren’t Modeling Ambiguity

Linguistics

Language Models

Machine Learning

Natural Language Processing

Dialogue Interfaces

Writing Aids

This research explores the challenge of managing ambiguity in natural language and presents an annotated benchmark to evaluate the performance of pretrained language models in recognizing and disentangling possible meanings. It shows that current language models struggle to handle ambiguity, including the recent GPT-4 model, and emphasizes the importance of ambiguity-sensitive tools for NLP applications.

This research highlights the challenges in managing ambiguity in language models and provides a benchmark to evaluate their performance. It recommends the development of ambiguity-sensitive tools to improve the performance of NLP applications in recognizing and disentangling possible meanings.

https://arxiv.org/pdf/2304.14399.pdf

https://arxiv.org/abs/2304.14399

https://twitter.com/_akhaliq/status/1651753105766088704/photo/1

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

Video Analytics

Computer Vision

Deep Learning

Video Understanding

Multimodal Systems

Video Foundation Models

This research introduces a prototype system for multimodal and versatile video understanding built upon a tracklet-centric paradigm, which treats tracklets as the basic video unit and employs Video Foundation Models (ViFMs) to annotate their properties. The system demonstrates its effectiveness in answering various video-related problems through extensive case studies on different types of in-the-wild videos.

This research presents a prototype system for multimodal and versatile video understanding that uses a tracklet-centric paradigm and ViFMs to annotate the properties of tracklets. It provides evidence of the effectiveness of the system in solving various video-related problems through extensive case studies.

https://arxiv.org/pdf/2304.14407.pdf

https://arxiv.org/abs/2304.14407

https://www.wangjunke.info/ChatVideo/

https://twitter.com/_akhaliq/status/1651796270682349569/video/1

JaxPruner: A concise library for sparsity research

Neural Architecture

Machine Learning

Deep Learning

Pruning

Sparse Training

Neural Networks

This research introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research that aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. It demonstrates the ease of integration of JaxPruner with existing JAX-based libraries and provides examples in four different codebases. The researchers believe that JaxPruner has the potential to accelerate sparsity research in the field of machine learning.

This research introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research that aims to provide concise implementations of popular pruning and sparse training algorithms. The research highlights the ease of integration of JaxPruner with existing JAX-based libraries and its potential to accelerate sparsity research.

https://github.com/google-research/jaxpruner

https://arxiv.org/pdf/2304.14082.pdf

https://arxiv.org/abs/2304.14082

https://twitter.com/arankomatsuzaki/status/1651756970095833089/photo/1

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

Scene understanding

Computer vision

Image and video editing

Augmented reality

Virtual reality

Computer graphics

The paper presents a method to inferring scene affordances and realistically inserting people into scenes by learning to re-pose humans in video clips and training a large-scale diffusion model on a dataset of 2.4M video clips that produces diverse plausible poses while respecting the scene context.

The proposed method can be useful for businesses working with augmented reality, virtual reality, or computer graphics, allowing them to realistically insert human subjects into scenes to showcase their products or services. Moreover, it can also automate the process of image or video editing that requires people to be inserted into scenes, saving time and costs.

https://arxiv.org/pdf/2304.14406.pdf

https://arxiv.org/abs/2304.14406

https://sumith1896.github.io/affordance-insertion/

https://twitter.com/_akhaliq/status/1651748957976928257/photo/1

Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models

Multi-party conversation modeling

Natural language processing

Dialogue systems

Chatbots

Conversational agents

Customer service

The paper studies multi-party conversations and evaluates the ability of language models to act as one or more characters in such conversations. The paper introduces a new dataset called MultiLIGHT and compares models trained on this dataset to existing pairwise-trained dialogue models and large language models with few-shot prompting.

The proposed method can be useful for businesses working with chatbots or conversational agents that are designed to interact with multiple customers or users simultaneously in group settings. It can improve the quality and coherence of the generated responses of the chatbot, leading to better customer satisfaction and engagement. Moreover, the MultiLIGHT dataset can be used to improve the performance of existing chatbots or train new chatbots for multi-party conversations.

https://arxiv.org/pdf/2304.13835.pdf

https://arxiv.org/abs/2304.13835

https://twitter.com/_akhaliq/status/1651754063971725313/photo/1