Mon Aug 15 2022 - Top Trending AI Papers

Wed Aug 24 2022

Mon Aug 15 2022

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Int8 matrix multiplication for feed-forward and attention projection layers in transformers

Machine Learning

Data processing and optimization

Natural Language Processing

Language translation

Chatbots and virtual assistants

Develops a procedure for Int8 matmul for feed-forward and attention projection layers in transformers, which halves the memory for inference while retaining fp32 performance.

Enables businesses to use large language models without requiring significant GPU memory for inference, making such models more accessible and improving performance.

https://arxiv.org/pdf/2208.07339.pdf

https://arxiv.org/abs/2208.07339

https://twitter.com/arankomatsuzaki/status/1559341831871033344/photo/1