Thu Mar 30 2023 - Top Trending AI Papers

BloombergGPT: A Large Language Model for Finance

Large Language Models (LLMs)

Natural Language Processing

Finance

Sentiment analysis in finance

Named entity recognition in finance

Question answering in finance

Presents BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data.

Can be used in multiple applications such as sentiment analysis, named entity recognition, and question answering in the financial domain. The model outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks.

https://arxiv.org/pdf/2303.17564.pdf

https://arxiv.org/abs/2303.17564

https://twitter.com/arankomatsuzaki/status/1641600009962672128/photo/1

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

Advanced artificial intelligence

Large Language Models (LLMs)

Machine Learning

Language tasks

Vision tasks

Speech tasks

A framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks.

HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks.

https://arxiv.org/pdf/2303.17580.pdf

https://arxiv.org/abs/2303.17580

https://twitter.com/_akhaliq/status/1641609192619294721/photo/1

Language Models can Solve Computer Tasks

Advanced artificial intelligence

Large Language Models (LLMs)

Natural Language Processing

Automating computer tasks

Problem-solving

Letting LLM to recursively criticize and improve its output significantly outperforms existing LLM methods on computer tasks and surpasses supervised learning (SL) and RL approaches.

A pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent recursively criticizes and improves its output. The RCI approach significantly outperforms existing LLM methods for automating computer tasks and surpasses supervised learning and reinforcement learning approaches on the MiniWoB++ benchmark.

https://arxiv.org/pdf/2303.17491.pdf

https://arxiv.org/abs/2303.17491

https://twitter.com/arankomatsuzaki/status/1641609722951516161/photo/1

Token Merging for Fast Stable Diffusion

Open Vocabulary Diffusion Models

Transformers

Artificial Intelligence for Image Processing

Image generation

ToMe for SD speeds up diffusion by merging redundant tokens, which results in faster image generation and reduced memory consumption without extra training. It can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while preserving high quality. This is achieved by exploiting natural redundancy in generated images. It is an extension of the original Token Merging approach that speeds up transformers by merging redundant tokens.

ToMe for SD can benefit businesses that use image generation models based on open vocabulary diffusion models that utilize transformers. It can significantly reduce memory consumption and speed up image generation without compromising quality.

https://arxiv.org/pdf/2303.17604.pdf

https://arxiv.org/abs/2303.17604

https://github.com/dbolya/tomesd

https://twitter.com/_akhaliq/status/1641606525725597699/photo/1

DiffCollage: Parallel Generation of Large Content with Diffusion Models

Compositional Diffusion Models

Parallel Generation

Artificial Intelligence for Image Processing

Infinite image generation

Panorama image generation

Long-duration text-guided motion generation

DiffCollage is a compositional diffusion model that can generate large content in parallel by leveraging diffusion models trained on generating pieces of the large content. It is based on a factor graph representation that allows aggregation of intermediate outputs from diffusion models defined on individual nodes to generate content of arbitrary size and shape without resorting to autoregressive generation. It has been applied to generating infinite images, panorama images, and long-duration text-guided motion.

DiffCollage can benefit businesses that require the generation of large content in parallel, such as social media companies or video editing firms. It is a more efficient approach that can reduce the time and resources required for autoregressive generation.

https://arxiv.org/pdf/2303.17076.pdf

https://arxiv.org/abs/2303.17076

https://research.nvidia.com/labs/dir/diffcollage/

https://twitter.com/_akhaliq/status/1641601273584668673/video/1