Generative Agents: Interactive Simulacra of Human Behavior
This paper introduces generative agents, computational software agents that simulate believable human behavior, to enable interactive applications ranging from immersive environments to prototyping tools. Through ablation, the components of the agent's architecture, observation, planning, and reflection, are shown to contribute critically to the believability of agent behavior, enabling believable simulations of human behavior.
Implement generative agents to enable interactive applications that simulate believable human behavior.
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
InstantBooth is a novel approach for personalized image generation that enables instant text-guided image personalization without any test-time finetuning. It keeps the fine details of identity by learning rich visual feature representation, achieving competitive results on unseen concepts while being 100 times faster than test-time finetuning-based methods.
Use InstantBooth for instant text-guided image personalization without test-time finetuning to improve image generation speed and efficiency.
Training-Free Layout Control with Cross-Attention Guidance
Layout guidance is a technique that achieves robust layout control of images generated by large pretrained Text-to-Image diffusion models without training through the layout guidance performed on the cross-attention maps. It manipulates the cross-attention layers and steers reconstruction in the desired direction, validating its effectiveness through several experiments and extending it to edit the layout and context of a given real image.
Use layout guidance to achieve robust layout control without training in Text-to-Image diffusion models.
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
This paper presents a new method called SparseFormer to represent images using a highly limited number of tokens in the latent space with sparse feature sampling procedure, imitating human's sparse visual recognition. SparseFormer achieves performance on par with canonical models while offering better accuracy-throughput tradeoff and can be easily extended to video classification.
Businesses can use SparseFormer to optimize image and video recognition tasks, reducing computational costs while maintaining accuracy.