Wed Mar 22 2023
Tue Mar 21 2023

MM-ReAct: Prompting ChatGPT for Multimodal Reasoning and Action

Computer vision, natural language processing
AI integration with visual intelligence
Multimodal understanding in various scenarios that require advanced visual understanding.

Proposes MM-REACT, a system that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action for advanced visual intelligence, and demonstrates its effectiveness in addressing advanced visual understanding.

Can improve advanced visual understanding in scenarios that require multimodal information processing.

Visual Representation Learning from Unlabeled Video using Contrastive Masked Autoencoders

Computer vision
Visual representation learning
Transfer learning from video to images on Imagenet-1k
Competitive transfer learning performance on Kinetics-400 video classification benchmark

Proposes ViC-MAE, a method that combines masked autoencoders and contrastive learning for visual representation learning, and demonstrates improved transfer learning from video to images on Imagenet-1k and competitive transfer learning performance on Kinetics-400 video classification benchmark.

Can improve transfer learning performance from video to images and video classification benchmark.

Large Language Models Can Be Used to Estimate the Ideologies of Politicians in a Zero-Shot Learning Setting

Natural language processing
Language models in social sciences
Measuring latent ideology in the social sciences

Demonstrates the potential of large language models in measuring latent ideology in the social sciences by scaling pairwise liberal-conservative comparisons between members of the U.S. Senate using prompts made to ChatGPT, with strong correlation to widely used liberal-conservative scales such as DW-NOMINATE.

Can potentially offer new solutions to problems of observability and measurement in the social sciences.

Mon Mar 20 2023
Sun Mar 19 2023
Thu Mar 16 2023
Tue Mar 14 2023