Sun Apr 09 2023 - Top Trending AI Papers

Generative Agents: Interactive Simulacra of Human Behavior

Natural language processing

Agent-based systems

Artificial intelligence

Immersive environments

Prototyping tools

This paper introduces generative agents, computational software agents that simulate believable human behavior, to enable interactive applications ranging from immersive environments to prototyping tools. Through ablation, the components of the agent's architecture, observation, planning, and reflection, are shown to contribute critically to the believability of agent behavior, enabling believable simulations of human behavior.

Implement generative agents to enable interactive applications that simulate believable human behavior.

https://arxiv.org/pdf/2304.03442.pdf

https://arxiv.org/abs/2304.03442

https://reverie.herokuapp.com/arXiv_Demo/

https://twitter.com/_akhaliq/status/1645257919997394945/photo/1

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Natural language processing

Computer vision

Artificial intelligence

Text-to-image generation

Image personalization

InstantBooth is a novel approach for personalized image generation that enables instant text-guided image personalization without any test-time finetuning. It keeps the fine details of identity by learning rich visual feature representation, achieving competitive results on unseen concepts while being 100 times faster than test-time finetuning-based methods.

Use InstantBooth for instant text-guided image personalization without test-time finetuning to improve image generation speed and efficiency.

https://arxiv.org/pdf/2304.03411.pdf

https://arxiv.org/abs/2304.03411

https://jshi31.github.io/InstantBooth/

https://twitter.com/_akhaliq/status/1645254918121422859/photo/1

Training-Free Layout Control with Cross-Attention Guidance

Text-to-image generation

Computer vision

Artificial intelligence

Image layout control

Image editing

Layout guidance is a technique that achieves robust layout control of images generated by large pretrained Text-to-Image diffusion models without training through the layout guidance performed on the cross-attention maps. It manipulates the cross-attention layers and steers reconstruction in the desired direction, validating its effectiveness through several experiments and extending it to edit the layout and context of a given real image.

Use layout guidance to achieve robust layout control without training in Text-to-Image diffusion models.

https://arxiv.org/pdf/2304.03373.pdf

https://arxiv.org/abs/2304.03373

https://silent-chen.github.io/layout-guidance/

https://github.com/silent-chen/layout-guidance

https://twitter.com/_akhaliq/status/1645253639575830530/video/1

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Visual modeling

Computer vision

Neural networks

Image and video recognition

This paper presents a new method called SparseFormer to represent images using a highly limited number of tokens in the latent space with sparse feature sampling procedure, imitating human's sparse visual recognition. SparseFormer achieves performance on par with canonical models while offering better accuracy-throughput tradeoff and can be easily extended to video classification.

Businesses can use SparseFormer to optimize image and video recognition tasks, reducing computational costs while maintaining accuracy.

https://arxiv.org/pdf/2304.03768.pdf

https://arxiv.org/abs/2304.03768

https://twitter.com/arankomatsuzaki/status/1645231862414184450/photo/1