Mon Jan 23 2023
Thu Jan 19 2023

Self Supervision Does Not Help Natural Language Supervision at Scale

Artificial Intelligence
Computer Vision
Natural Language Processing
general purpose image encoders
downstream task improvement
large-scale image-text training

Finds that a combination of CLIP + MAE provides a benefit over CLIP when trained on 11.3M image-text pairs, but little to no benefit over CLIP when trained on 1.4B images.

This paper provides insight into the effectiveness of self-supervision for large-scale image-text training.

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Artificial Intelligence
Computer Vision
image representation learning
self-supervised learning
ViT-Huge/14 training on ImageNet

Demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations.

This paper proposes a non-generative approach for self-supervised learning from images that produces highly semantic representations and performs well across a wide range of tasks.

Multiview Compressive Coding for 3D Reconstruction

Artificial Intelligence
Computer Vision
single-view 3D reconstruction
large-scale training from diverse RGB-D videos
generative modeling of 3D structure

MCC learns to compress the input appearance and geometry to predict the 3D structure by querying a 3D-aware decoder and substantially outperforms the SotA.

This paper proposes a framework for single-view 3D reconstruction that improves upon prior works by learning generalizable representations, resulting in strong generalization to novel objects.

Wed Jan 18 2023
Tue Jan 17 2023
Mon Jan 16 2023
Sun Jan 08 2023