Mon Feb 27 2023 - Top Trending AI Papers

Language Is Not All You Need: Aligning Perception with Language Models

large language models

machine learning

neuroscience

natural language processing

computer vision

artificial general intelligence

Presents KOSMOS-1, a multimodal LLM that is capable of perceiving multimodal input, following instructions, and performing in-context learning for multimodal tasks.

This paper introduces KOSMOS-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). It achieves impressive performance on language understanding, generation, and even OCR-free NLP as well as perception-language tasks and vision tasks. It also introduces a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.

https://arxiv.org/pdf/2302.14045.pdf

https://arxiv.org/abs/2302.14045

https://twitter.com/arankomatsuzaki/status/1630385421707362304/photo/1

Internet Explorer: Targeted Representation Learning on the Open Web

large datasets

computer vision

natural language processing

image recognition

self-supervised learning

web scraping

Internet Explorer explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a target dataset.

Internet Explorer proposes dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. It explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30-40 hours. Results, visualizations, and videos are available.

https://internet-explorer-ssl.github.io/

https://arxiv.org/pdf/2302.14051.pdf

https://arxiv.org/abs/2302.14051

https://twitter.com/arankomatsuzaki/status/1630386388162994176/photo/1

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

spiking neural networks

machine learning

neuroscience

natural language generation

neuromorphic computing

energy-efficient deep learning

Competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption on neuromorphic hardware.

SpikeGPT is a generative language model with pure binary, event-driven spiking activation units. It is trained on three model variants with 45M, 125M, and 260M parameters. It remains competitive with non-spiking models on tested benchmarks while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. The code implementation and pre-trained model on BookCorpus are available.

https://github.com/ridgerchu/SpikeGPT

https://arxiv.org/pdf/2302.13939.pdf

https://arxiv.org/abs/2302.13939

https://twitter.com/arankomatsuzaki/status/1630390394478264327/photo/1

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Image and text generation

Computer Vision

Natural Language Processing

Image generation for storytelling and marketing

The paper proposes a method called Directed Diffusion, which provides positional control over multiple objects in image generation using text prompts.

Businesses can use this method to generate high-quality images for storytelling and marketing purposes. It can help them save time and resources in image creation.

https://hohonu-vicml.github.io/DirectedDiffusion.Page/

https://arxiv.org/pdf/2302.13153.pdf

https://arxiv.org/abs/2302.13153

https://twitter.com/arankomatsuzaki/status/1630385889225457665/photo/1