Tue Dec 13 2022
Mon Dec 12 2022

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

Computer vision
Machine learning
Image classification

Demonstrates that CLIP is better or at least competitive in fine-tuning compared with supervised pre-training approaches or CLIP + MIM.

Provides insights into the hyper-parameters of CLIP fine-tuning and challenges the conventional conclusion that CLIP isn't suitable for fine-tuning.

MAGVIT: Masked Generative Video Transformer

Computer vision
Deep learning
Video synthesis

Outperforms existing methods in inference time by two orders of magnitude against diffusion models and by 60x against autoregressive models.

Recommends MAGVIT for various video synthesis tasks with a single model and highlights its quality, efficiency, and flexibility.

The Stable Artist: Steering Semantics in Diffusion Latent Space

Computer vision
Deep learning
Image editing

Presents the Stable Artist to enable control by allowing the artist to steer the diffusion process along a variable number of semantic directions.

Suggests the Stable Artist for image editing and composition and highlights its ability to provide fine-grained control of the image generation process.

Sun Dec 11 2022
Thu Dec 08 2022
Tue Dec 06 2022
Mon Dec 05 2022