Sun Dec 04 2022
Tue Nov 29 2022

RGB no more: Minimally-decoded JPEG Vision Transformers

Neural Networks
Computer Vision
Deep Learning
Accelerate data load for computer vision projects
Implement ViT models for computer vision tasks
Improve training efficiency and inference speed without sacrificing accuracy

Achieves up to 39.2% faster training and 17.9% faster inference with no accuracy loss compared to the RGB counterpart.

This research presents a novel approach to training Vision Transformers (ViT) directly from the encoded features of JPEG, thus avoiding most of the decoding overhead and accelerating data load. It also explores data augmentation directly on these encoded features, which to the authors' knowledge, has not been explored in-depth for training in this setting. Implementing these improvements can result in faster training and inference with no loss in accuracy compared to traditional RGB networks.

Wed Nov 23 2022
Tue Nov 22 2022
Thu Nov 17 2022
Wed Nov 09 2022