Mon Feb 20 2023 - Top Trending AI Papers

Tue Feb 21 2023

Mon Feb 20 2023

Poisoning Web-Scale Training Datasets is Practical

Machine learning security

Web-scale datasets

Ensuring dataset integrity and maintaining trust in machine learning models

Securing web-scale training datasets

Shows how to effectively poison 0.01% of datasets like LAION-400M for just $60 USD and degrade model performance.

Recommends low-overhead defenses to mitigate dataset poisoning attacks.

https://arxiv.org/pdf/2302.10149.pdf

https://arxiv.org/abs/2302.10149

https://twitter.com/arankomatsuzaki/status/1627851796604411904/photo/1

Scaling Laws for Multilingual Neural Machine Translation

Language translation

Neural machine translation

Multilingual models

Multilingual neural machine translation

Language weighting and composition

Examines how increases in the model size affect the model performance and investigate the role of the training mixture composition on the scaling behavior.

Predicts the performance of multilingual models trained with any language weighting at any scale, reducing efforts required for language balancing in large multilingual models.

https://arxiv.org/pdf/2302.09650.pdf

https://arxiv.org/abs/2302.09650

https://twitter.com/arankomatsuzaki/status/1627859359152709632/photo/1