Unifying Diffusion Models’ Latent Space, With Applications to Cyclediffusion and Guidance
This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space, with applications to CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation, and to guiding pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models.
The findings of this paper demonstrate several intriguing consequences, such as the emergence of a common latent space from two diffusion models trained independently on related domains, the use of text-to-image diffusion models as zero-shot image-to-image editors, and the better coverage of low-density sub-populations and individuals by diffusion models over GANs when guided by the CLIP model and a face recognition model. These insights can inform the development and implementation of AI models for image generation, image-to-image translation, and image manipulation, as well as for improving the performance and efficiency of existing models.
Discovered Policy Optimisation
This paper presents Learnt Policy Optimisation (LPO), which outperforms a SotA baseline (PPO) in unseen environments, and with unseen hyperparameter settings, by meta-learning a "drift" function in the Mirror Learning space, and formulating a novel, closed-form RL algorithm, Discovered Policy Optimisation (DPO).
The results of this paper offer original insights into policy optimisation, and demonstrate the potential of meta-learning for automatic machine learning method optimisation, as well as for addressing the limitations of existing hand-crafted algorithms. These insights can inform the development and implementation of AI models for reinforcement learning in various domains, as well as for improving their performance and efficiency.
f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
This paper proposes f-DM, a generalized family of diffusion models (DMs) which allows progressive signal transformation, and applies it in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations based on the encoder of pretrained VAEs, producing high-quality samples on standard image generation benchmarks like FFHQ, AFHQ, LSUN, and ImageNet with better efficiency and semantic interpretation than DDPM and LDM.
The findings of this paper demonstrate the potential of f-DM for improving the efficiency and interpretability of DMs in generative modeling tasks, such as image generation, as well as for incorporating hand-designed or learned transformations in the latent space of DMs. These insights can inform the development and implementation of AI models for various generative modeling tasks, as well as for improving their performance and efficiency.
Habitat-Matterport 3D Semantics Dataset
Presents HM3DSEM, the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community.
This dataset can be used to improve object recognition and localization in real-world spaces, which can have applications in various industries, such as architecture, construction, and retail.