Thu Mar 23 2023
Wed Mar 22 2023

Sparks of Artificial General Intelligence: Early experiments with GPT-4

AI, NLP
General Intelligence, Large Language Models
various domains and tasks, math, coding, vision, medicine, law, psychology

Reports on an investigation of an early version of GPT-4, contending that it exhibits more general intelligence than previous AI models and can solve novel and difficult tasks in various domains without any special prompting, with performance strikingly close to human-level and often surpassing prior models. Discusses the rising capabilities and implications of these models and the challenges ahead for advancing towards deeper and more comprehensive versions of AGI.

Businesses can stay updated on the latest developments in GPT-4 and other large language models to understand their potential impact on various domains and tasks. They can explore the potential applications of these models, including their use in solving complex problems that require human-level performance, such as in mathematics, coding, vision, medicine, law, and psychology. They can also reflect on the societal influences of these technological leaps and possible future research directions.

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

Computer Vision, AR/VR
3D Scenes, Image Editing, Diffusion Model
architecture, gaming, virtual reality

Proposes a method for editing 3D scenes with text-instructions using image-conditioned diffusion model (InstructPix2Pix) that iteratively edits the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. Demonstrates the ability to edit large-scale, real-world scenes and accomplish more realistic, targeted edits than prior work.

Businesses that deal with 3D scenes, such as in architecture, gaming, or virtual reality, can benefit from this method by allowing them to make more realistic and targeted edits in 3D scenes using text-instructions. This can improve the efficiency and accuracy of their processes, as well as enhance the realism and quality of their products.

MEGA: Multilingual Evaluation of Generative AI

AI, NLP
Generative AI, NLP, Multilingual Benchmarking
NLP, multiple languages

Presents the first comprehensive multilingual benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 8 diverse tasks and 33 typologically diverse languages. Compares the performance of generative LLMs to State of the Art (SOTA) non-autoregressive models on these tasks and analyzes the performance of models across languages.

Businesses that deal with NLP in multiple languages can benefit from MEGA by evaluating generative LLMs on standard NLP benchmarks in a multilingual setting, to determine their capabilities and limits across languages. They can compare the performance of generative LLMs to State of the Art non-autoregressive models on these tasks and explore the reasons why generative LLMs may not be optimal for all languages. They can use the framework provided by MEGA to evaluate and improve generative LLMs in the multilingual setting.

The Prompt Artists

Text-to-image models
Generative AI
Artificial intelligence in creative processes
Art and design

Examines the art practices, artwork, and motivations of prolific users of the latest generation of text-to image models.

Implications for design regarding future prompting and image editing options

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Diffusion over Diffusion architecture
Video generation
Computer vision
Video production

Proposes a novel Diffusion over Diffusion architecture for eXtremely Long video generation that generates high-quality long videos with both global and local coherence and decreases the average inference time by 20x.

Enables high-quality video generation with both global and local coherence, while reducing the inference time by a significant margin.

Tue Mar 21 2023
Mon Mar 20 2023
Sun Mar 19 2023
Thu Mar 16 2023