Wed Apr 19 2023 - Top Trending AI Papers

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

Machine Learning

Language Models

Information Retrieval (IR)

Finds that properly instructed ChatGPT and GPT-4 can deliver competitive, even superior results than supervised methods on popular IR benchmarks.

Implementing generative LLMs such as ChatGPT and GPT-4 for relevance ranking in Information Retrieval (IR) can deliver better results than supervised methods.

https://arxiv.org/pdf/2304.09542.pdf

https://arxiv.org/abs/2304.09542

https://twitter.com/arankomatsuzaki/status/1648852272380616704/photo/1

Evaluating Verifiability in Generative Search Engines

Machine Learning

Information Retrieval (IR)

Generative Search Engines

Finds that responses contain unsupported statements and inaccurate citations majority of times.

Users should be aware that responses generated by existing generative search engines may contain unsupported statements and inaccurate citations.

http://perplexity.ai

https://arxiv.org/pdf/2304.09848.pdf

https://arxiv.org/abs/2304.09848

https://twitter.com/arankomatsuzaki/status/1648850027165655040/photo/1

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Artificial Intelligence

Generative Models

Computer Vision

Virtual Reality

Robotics Simulation

Introduces NeuralField-LDM, a generative model capable of synthesizing complex 3D environments with a substantial improvement over existing state-of-the-art models.

Use NeuralField-LDM to generate high-quality 3D scenes for applications such as virtual reality and robotics simulation.

https://arxiv.org/pdf/2304.09787.pdf

https://arxiv.org/abs/2304.09787

https://research.nvidia.com/labs/toronto-ai/NFLDM/

https://twitter.com/_akhaliq/status/1648848468234911754/video/1

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Machine Learning

Compositional reasoning

Plug-and-play framework

natural language processing tasks

Chameleon is a plug-and-play compositional reasoning framework that augments LLMs to help address challenges faced by them, such as an inability to access up-to-date information, utilize external tools or perform precise mathematical reasoning. It synthesizes programs to compose various tools tailored to user interests and infers the appropriate sequence of tools to generate a final response. The framework showcases adaptability and effectiveness on two tasks: ScienceQA and TabMWP. Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA and a 17.8% increase over the state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP.

Chameleon can help businesses improve their natural language processing tasks by addressing the inherent limitations faced by Large Language Models. It can access up-to-date information, utilize external tools or perform precise mathematical reasoning. The synthesizing programs can be customized to tailor to user interests, and the framework can adapt to new tasks to generate final responses that are more accurate and effective.

https://chameleon-llm.github.io/

https://arxiv.org/pdf/2304.09842.pdf

https://arxiv.org/abs/2304.09842

https://twitter.com/arankomatsuzaki/status/1648848332977221632/photo/1

Pretrained Language Models as Visual Planners for Human Assistance

Machine Learning

Visual Planning

Sequence Modeling

multi-modal AI assistants

Visual Planning for Assistance (VPA) is the task proposed in this paper, which aims to obtain a plan, i.e., a sequence of actions, to achieve a goal described in natural language based on a user's progress in a video. The task requires assessing the user's progress from the untrimmed video, relating it to the requirements of the goal, and handling long video history and arbitrarily complex action dependencies. The paper presents Visual Language Model based Planner (VLaMP), which leverages pre-trained LMs as the sequence model and performs significantly better than baselines w.r.t all metrics that evaluate the generated plan.

VLaMP can help businesses improve their multi-modal AI assistants, which can guide users to achieve complex multi-step goals. The framework can obtain plans to achieve goals described in natural language based on a user's progress in a video by assessing the progress from the untrimmed video and relating it to the requirements of the goal. The pre-trained LMs as the sequence model can help handle long video history and arbitrarily complex action dependencies, with VLaMP performing significantly better than existing baselines.

https://arxiv.org/pdf/2304.09179.pdf

https://arxiv.org/abs/2304.09179

https://twitter.com/_akhaliq/status/1648853383200010241/photo/1