Tue May 02 2023
Tue May 02 2023

What Do Self-Supervised Vision Transformers Learn?

Self-supervised learning
Computer vision
Machine learning
Image classification
Object detection
Semantic segmentation

Comparative study on how and why contrastive learning and masked image modeling differ in their representations and performance of downstream tasks.

Insights on how self-supervised Vision Transformers (ViTs) learn and how contrastive learning and masked image modeling can complement each other.

Learning to Reason and Memorize with Self-Notes

Language models
Artificial intelligence
Machine learning
Natural language processing
Question answering
Text generation

A simple method for solving limited context memory and multi-step reasoning problems in large language models by allowing the model to take Self-Notes.

Recommendations for improving the memory and reasoning capabilities of large language models.

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Causality
Artificial intelligence
Machine learning
Medicine
Science
Law
Policy

Investigation of the causal capabilities of large language models and their implications for societally impactful domains such as medicine, science, law, and policy.

Insights into the causal reasoning capabilities of large language models and their potential impact on various domains.

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

scene generation
virtual reality
generative AI
virtual reality
scene generation and editing tasks
metaverse
gaming simulation

This study presents an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains or scenarios for scene understanding and generation in the physical or virtual world. The approach leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments, which is validated in scene generation and editing tasks. The potential benefit of incorporating ArK in generative AI for applications such as metaverse and gaming simulation is demonstrated.

ArK can significantly improve the quality of generated 2D/3D scenes, making it a valuable addition to applications such as metaverse and gaming simulation.

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

NeRF-based method
digital human
metaverse
real-time talking face generation

The paper presents GeneFace++, a NeRF-based method that achieves stable and real-time talking face generation with generalized audio-lip synchronization. GeneFace++ improves on lip synchronization, video quality, and system efficiency through the utilization of pitch contour as an auxiliary feature, landmark locally linear embedding method to regulate outliers in predicted motion sequence, and efficient NeRF-based motion-to-video renderer. The method outperforms state-of-the-art baselines in terms of subjective and objective evaluation.

GeneFace++ can be used for real-time talking face generation with generalized audio-lip synchronization, improving on previous methods in terms of lip synchronization, video quality, and system efficiency.

Mon May 01 2023
Sun Apr 30 2023
Thu Apr 27 2023
Wed Apr 26 2023