Tue Nov 29 2022 - Top Trending AI Papers

Sun Dec 04 2022

Tue Nov 29 2022

RGB no more: Minimally-decoded JPEG Vision Transformers

Neural Networks

Computer Vision

Deep Learning

Accelerate data load for computer vision projects

Implement ViT models for computer vision tasks

Improve training efficiency and inference speed without sacrificing accuracy

Achieves up to 39.2% faster training and 17.9% faster inference with no accuracy loss compared to the RGB counterpart.

This research presents a novel approach to training Vision Transformers (ViT) directly from the encoded features of JPEG, thus avoiding most of the decoding overhead and accelerating data load. It also explores data augmentation directly on these encoded features, which to the authors' knowledge, has not been explored in-depth for training in this setting. Implementing these improvements can result in faster training and inference with no loss in accuracy compared to traditional RGB networks.

https://arxiv.org/pdf/2211.16421.pdf

https://arxiv.org/abs/2211.16421

https://twitter.com/arankomatsuzaki/status/1597812026411470848/photo/1