Wed Aug 10 2022
Tue Aug 09 2022

Memorization in Large Language Models: Quantification and Mitigation

Language Models
AI Ethics
Privacy Preservation
Natural Language Processing
Privacy Preservation
Data Utilization Improvement

This paper discusses how large language models (LMs) memorize their training data and emit it verbatim, causing privacy violations, degraded utility, and unfairness. It presents three log-linear relationships that quantify the degree of memorization in LMs and finds that it is more prevalent than previously believed and likely to get worse with further scaling.

This paper provides insights on how large language models memorize their training data and emit it verbatim, causing issues like privacy violations and degraded utility. Businesses can take these insights to mitigate the problem of memorization in their AI-powered business processes and workflows.

Bit Diffusion: Generating Discrete Data using Diffusion Models with Self-Conditioning

Diffusion Models
Computer Vision
Data Generation
Image generation
Image captioning
Data generation

This paper presents Bit Diffusion, a simple and generic approach for generating discrete data with continuous diffusion models. It represents the discrete data as binary bits and trains a continuous diffusion model to model these bits as real numbers. The proposed approach achieves strong performance in both discrete image generation and image captioning tasks.

This paper presents a new approach to generating discrete data that businesses can use to improve their image generation and image captioning tasks. The Bit Diffusion approach is simple and generic, making it a useful tool for various applications.

Sun Aug 07 2022
Wed Aug 03 2022
Tue Aug 02 2022
Thu Jul 28 2022