Synthetic Data from Diffusion Models Improves ImageNet Classification
This paper explores the use of large-scale text-to-image diffusion models for generative data augmentation to improve ImageNet classification accuracy.
The use of synthetic data can significantly improve classification accuracy over strong baselines like ResNet and Vision Transformer, which could lead to better performance in image-based business operations such as object recognition in retail or quality control in manufacturing.
Visual Instruction Tuning
This paper presents a method for using language-only GPT-4 to generate multimodal language-image instruction-following data to improve large multimodal models for vision and language understanding.
The LLaVA model demonstrates impressive multimodal chat abilities and yields a new state-of-the-art accuracy on Science QA, which could be useful for businesses in industries such as customer service, e-commerce, or online education.
DETRs Beat YOLOs on Real-time Object Detection
This paper proposes an end-to-end real-time object detector called RT-DETR that addresses the high computational cost of DETRs by avoiding the inference delay caused by non-maximum suppression (NMS).
The RT-DETR detector achieves state-of-the-art accuracy and speed in real-time object detection, which is useful for businesses in industries such as surveillance, autonomous vehicles, or robotics.
Low-code LLM: Visual Programming over LLMs
This paper introduces a human-LLM interaction framework, Low-code LLM, that incorporates simple low-code visual programming interactions to achieve more controllable and stable responses. It consists of a Planning LLM that designs a structured planning workflow for complex tasks and an Executing LLM that generates responses following the user-confirmed workflow. The advantages of Low-code LLM include controllable generation results, user-friendly human-LLM interaction, and broadly applicable scenarios. This approach aims to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks.
Implement Low-code LLM framework to improve task planning and response generation process. It allows for more controllable and stable responses while incorporating user ideas and reducing the need for trivial prompts.
Tool Learning with Foundation Models
This paper presents a systematic investigation of tool learning with foundation models, combining the strengths of specialized tools and foundation models to achieve enhanced accuracy, efficiency, and automation in problem-solving. The authors introduce a general tool learning framework to improve task decomposition, dynamic plan adjustment, and tool selection. They also discuss how to train models for improved tool-use capabilities and facilitate generalization in tool learning. The paper experiments with 17 representative tools and shows the potential of current foundation models in skillfully utilizing tools. The authors also discuss several open problems that require further investigation for tool learning.
Integrate specialized tools with foundation models to achieve enhanced accuracy, efficiency, and automation in problem-solving. Implement a general tool learning framework to improve task decomposition, dynamic plan adjustment, and tool selection. Train models for improved tool-use capabilities and facilitate generalization in tool learning.