Mon Dec 12 2022 - Top Trending AI Papers

Tue Dec 13 2022

Mon Dec 12 2022

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

Computer vision

Machine learning

Image classification

Demonstrates that CLIP is better or at least competitive in fine-tuning compared with supervised pre-training approaches or CLIP + MIM.

Provides insights into the hyper-parameters of CLIP fine-tuning and challenges the conventional conclusion that CLIP isn't suitable for fine-tuning.

https://arxiv.org/pdf/2212.06138.pdf

https://arxiv.org/abs/2212.06138

https://twitter.com/arankomatsuzaki/status/1602479341518061570/photo/1

MAGVIT: Masked Generative Video Transformer

Computer vision

Deep learning

Video synthesis

Outperforms existing methods in inference time by two orders of magnitude against diffusion models and by 60x against autoregressive models.

Recommends MAGVIT for various video synthesis tasks with a single model and highlights its quality, efficiency, and flexibility.

https://magvit.cs.cmu.edu/

https://arxiv.org/pdf/2212.05199.pdf

https://arxiv.org/abs/2212.05199

https://twitter.com/arankomatsuzaki/status/1602487538199011329/photo/1

The Stable Artist: Steering Semantics in Diffusion Latent Space

Computer vision

Deep learning

Image editing

Presents the Stable Artist to enable control by allowing the artist to steer the diffusion process along a variable number of semantic directions.

Suggests the Stable Artist for image editing and composition and highlights its ability to provide fine-grained control of the image generation process.

https://arxiv.org/pdf/2212.06013.pdf

https://arxiv.org/abs/2212.06013

https://twitter.com/arankomatsuzaki/status/1602484654002655232/photo/1