MM-ReAct: Prompting ChatGPT for Multimodal Reasoning and Action
Proposes MM-REACT, a system that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action for advanced visual intelligence, and demonstrates its effectiveness in addressing advanced visual understanding.
Can improve advanced visual understanding in scenarios that require multimodal information processing.
Visual Representation Learning from Unlabeled Video using Contrastive Masked Autoencoders
Proposes ViC-MAE, a method that combines masked autoencoders and contrastive learning for visual representation learning, and demonstrates improved transfer learning from video to images on Imagenet-1k and competitive transfer learning performance on Kinetics-400 video classification benchmark.
Can improve transfer learning performance from video to images and video classification benchmark.
Large Language Models Can Be Used to Estimate the Ideologies of Politicians in a Zero-Shot Learning Setting
Demonstrates the potential of large language models in measuring latent ideology in the social sciences by scaling pairwise liberal-conservative comparisons between members of the U.S. Senate using prompts made to ChatGPT, with strong correlation to widely used liberal-conservative scales such as DW-NOMINATE.
Can potentially offer new solutions to problems of observability and measurement in the social sciences.