Wed Mar 22 2023 - Top Trending AI Papers

Sparks of Artificial General Intelligence: Early experiments with GPT-4

AI, NLP

General Intelligence, Large Language Models

various domains and tasks, math, coding, vision, medicine, law, psychology

Reports on an investigation of an early version of GPT-4, contending that it exhibits more general intelligence than previous AI models and can solve novel and difficult tasks in various domains without any special prompting, with performance strikingly close to human-level and often surpassing prior models. Discusses the rising capabilities and implications of these models and the challenges ahead for advancing towards deeper and more comprehensive versions of AGI.

Businesses can stay updated on the latest developments in GPT-4 and other large language models to understand their potential impact on various domains and tasks. They can explore the potential applications of these models, including their use in solving complex problems that require human-level performance, such as in mathematics, coding, vision, medicine, law, and psychology. They can also reflect on the societal influences of these technological leaps and possible future research directions.

https://arxiv.org/pdf/2303.12712.pdf

https://arxiv.org/abs/2303.12712

https://twitter.com/arankomatsuzaki/status/1638703549767974913/photo/1

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

Computer Vision, AR/VR

3D Scenes, Image Editing, Diffusion Model

architecture, gaming, virtual reality

Proposes a method for editing 3D scenes with text-instructions using image-conditioned diffusion model (InstructPix2Pix) that iteratively edits the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. Demonstrates the ability to edit large-scale, real-world scenes and accomplish more realistic, targeted edits than prior work.

Businesses that deal with 3D scenes, such as in architecture, gaming, or virtual reality, can benefit from this method by allowing them to make more realistic and targeted edits in 3D scenes using text-instructions. This can improve the efficiency and accuracy of their processes, as well as enhance the realism and quality of their products.

https://instruct-nerf2nerf.github.io/

https://arxiv.org/pdf/2303.12789.pdf

https://arxiv.org/abs/2303.12789

https://twitter.com/arankomatsuzaki/status/1638702584549588992/video/1

MEGA: Multilingual Evaluation of Generative AI

AI, NLP

Generative AI, NLP, Multilingual Benchmarking

NLP, multiple languages

Presents the first comprehensive multilingual benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 8 diverse tasks and 33 typologically diverse languages. Compares the performance of generative LLMs to State of the Art (SOTA) non-autoregressive models on these tasks and analyzes the performance of models across languages.

Businesses that deal with NLP in multiple languages can benefit from MEGA by evaluating generative LLMs on standard NLP benchmarks in a multilingual setting, to determine their capabilities and limits across languages. They can compare the performance of generative LLMs to State of the Art non-autoregressive models on these tasks and explore the reasons why generative LLMs may not be optimal for all languages. They can use the framework provided by MEGA to evaluate and improve generative LLMs in the multilingual setting.

https://arxiv.org/pdf/2303.12528.pdf

https://arxiv.org/abs/2303.12528

https://twitter.com/arankomatsuzaki/status/1638706674465751041/photo/1

The Prompt Artists

Text-to-image models

Generative AI

Artificial intelligence in creative processes

Art and design

Examines the art practices, artwork, and motivations of prolific users of the latest generation of text-to image models.

Implications for design regarding future prompting and image editing options

https://arxiv.org/pdf/2303.12253.pdf

https://arxiv.org/abs/2303.12253

https://twitter.com/arankomatsuzaki/status/1638708213406212098/photo/1

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Diffusion over Diffusion architecture

Video generation

Computer vision

Video production

Proposes a novel Diffusion over Diffusion architecture for eXtremely Long video generation that generates high-quality long videos with both global and local coherence and decreases the average inference time by 20x.

Enables high-quality video generation with both global and local coherence, while reducing the inference time by a significant margin.

https://arxiv.org/pdf/2303.12346.pdf

https://arxiv.org/abs/2303.12346

https://twitter.com/arankomatsuzaki/status/1638705916986941440/photo/1