Thu Dec 22 2022
Wed Dec 21 2022

X-Decoder: Generalized Decoding for Pixel, Image and Language

Decoding models
Artificial Intelligence
Computer Vision
Image segmentation
Vision-language (VL) tasks
Efficient finetuning and novel task composition

This paper presents X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. It achieves state-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets and is flexible for efficient finetuning and novel task composition.

X-Decoder can improve image segmentation and vision-language (VL) tasks, providing a unified way to support all types of image segmentation and a variety of VL tasks. It exhibits strong transferability to a wide range of downstream tasks in both zero-shot and finetuning settings. Its flexibility for efficient finetuning and novel task composition makes it a valuable tool for businesses that use AI in their processes and workflows.

MULTIINSTRUCT: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

Language models
Artificial Intelligence
Multi-modal zero-shot learning
Instruction tuning

The paper presents MULTIINSTRUCT, the first multimodal instruction tuning benchmark dataset that consists of 47 diverse multimodal tasks covering 11 broad categories. It explores instruction tuning, a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions, for vision and multimodal tasks, achieving strong zero-shot performance on various unseen multimodal tasks.

MULTIINSTRUCT can improve multi-modal zero-shot learning via instruction tuning, achieving strong zero-shot performance on various unseen multimodal tasks. Businesses can benefit from this new learning paradigm to fine-tune pre-trained language models on tasks specified through instructions, allowing for better performance on unseen tasks.

Tue Dec 20 2022
Mon Dec 19 2022
Sun Dec 18 2022
Thu Dec 15 2022