Mon Apr 10 2023
Sun Apr 09 2023

Generative Agents: Interactive Simulacra of Human Behavior

Natural language processing
Agent-based systems
Artificial intelligence
Immersive environments
Prototyping tools

This paper introduces generative agents, computational software agents that simulate believable human behavior, to enable interactive applications ranging from immersive environments to prototyping tools. Through ablation, the components of the agent's architecture, observation, planning, and reflection, are shown to contribute critically to the believability of agent behavior, enabling believable simulations of human behavior.

Implement generative agents to enable interactive applications that simulate believable human behavior.

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Natural language processing
Computer vision
Artificial intelligence
Text-to-image generation
Image personalization

InstantBooth is a novel approach for personalized image generation that enables instant text-guided image personalization without any test-time finetuning. It keeps the fine details of identity by learning rich visual feature representation, achieving competitive results on unseen concepts while being 100 times faster than test-time finetuning-based methods.

Use InstantBooth for instant text-guided image personalization without test-time finetuning to improve image generation speed and efficiency.

Training-Free Layout Control with Cross-Attention Guidance

Text-to-image generation
Computer vision
Artificial intelligence
Image layout control
Image editing

Layout guidance is a technique that achieves robust layout control of images generated by large pretrained Text-to-Image diffusion models without training through the layout guidance performed on the cross-attention maps. It manipulates the cross-attention layers and steers reconstruction in the desired direction, validating its effectiveness through several experiments and extending it to edit the layout and context of a given real image.

Use layout guidance to achieve robust layout control without training in Text-to-Image diffusion models.

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Visual modeling
Computer vision
Neural networks
Image and video recognition

This paper presents a new method called SparseFormer to represent images using a highly limited number of tokens in the latent space with sparse feature sampling procedure, imitating human's sparse visual recognition. SparseFormer achieves performance on par with canonical models while offering better accuracy-throughput tradeoff and can be easily extended to video classification.

Businesses can use SparseFormer to optimize image and video recognition tasks, reducing computational costs while maintaining accuracy.

Thu Apr 06 2023
Wed Apr 05 2023
Tue Apr 04 2023
Mon Apr 03 2023