Tue Feb 21 2023
Mon Feb 20 2023

Poisoning Web-Scale Training Datasets is Practical

Machine learning security
Machine learning security
Web-scale datasets
Ensuring dataset integrity and maintaining trust in machine learning models
Securing web-scale training datasets

Shows how to effectively poison 0.01% of datasets like LAION-400M for just $60 USD and degrade model performance.

Recommends low-overhead defenses to mitigate dataset poisoning attacks.

Scaling Laws for Multilingual Neural Machine Translation

Language translation
Neural machine translation
Multilingual models
Multilingual neural machine translation
Language weighting and composition

Examines how increases in the model size affect the model performance and investigate the role of the training mixture composition on the scaling behavior.

Predicts the performance of multilingual models trained with any language weighting at any scale, reducing efforts required for language balancing in large multilingual models.

Sun Feb 19 2023
Thu Feb 16 2023
Wed Feb 15 2023
Tue Feb 14 2023