Extracting Training Data from Diffusion Models
Diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, over a thousand training examples are extracted from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. Overall, the results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.
Businesses using diffusion models for generating synthetic images need to be aware of the potential privacy risks associated with using such models for training data extraction.
REPLUG: Retrieval-Augmented Black-Box Language Models
REPLUG is a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. REPLUG significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.
Businesses looking to improve their language modeling tasks can use REPLUG to significantly improve the performance of their existing models without having to retrain them from scratch.
Looped Transformers as Programmable Computers
A framework is presented for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Using these building blocks, a small instruction-set computer is emulated, which allows iterative algorithms to be mapped to programs that can be executed by a looped, 13-layer transformer. The transformer can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation.
Businesses looking to develop custom algorithms for their specific use cases can use looped transformers to emulate basic computing blocks and execute full-fledged, general-purpose programs.
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
AudioLDM is a text-to-audio (TTA) system built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. It achieves SotA TTA performance with improved generation quality and computational efficiency, and enables text-guided audio manipulations in a zero-shot fashion.
Businesses can use AudioLDM to generate high-quality audio for various applications, such as marketing videos, e-learning materials, and audiobooks. It can also be used to enhance customer experience by providing natural-sounding audio responses in chatbots and virtual assistants.
Sample Efficient Deep Reinforcement Learning via Local Planning
Uncertainty-first local planning (UFLP) is an algorithmic framework for sample-efficient deep reinforcement learning (RL) with a simulator. It improves the sample cost of several baseline RL algorithms on difficult exploration tasks, achieving super-human performance on Montezuma's Revenge.
Businesses can use UFLP to optimize various decision-making scenarios, such as inventory management, resource allocation, and pricing strategies. It can also be applied to game AI to enhance player experience and engagement.