Wed Aug 24 2022
Mon Aug 15 2022

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Int8 matrix multiplication for feed-forward and attention projection layers in transformers
Machine Learning
Data processing and optimization
Natural Language Processing
Language translation
Chatbots and virtual assistants

Develops a procedure for Int8 matmul for feed-forward and attention projection layers in transformers, which halves the memory for inference while retaining fp32 performance.

Enables businesses to use large language models without requiring significant GPU memory for inference, making such models more accessible and improving performance.

Sun Aug 14 2022
Wed Aug 10 2022
Tue Aug 09 2022
Sun Aug 07 2022