Tue May 02 2023 - Top Trending AI Papers

What Do Self-Supervised Vision Transformers Learn?

Self-supervised learning

Computer vision

Machine learning

Image classification

Object detection

Semantic segmentation

Comparative study on how and why contrastive learning and masked image modeling differ in their representations and performance of downstream tasks.

Insights on how self-supervised Vision Transformers (ViTs) learn and how contrastive learning and masked image modeling can complement each other.

https://arxiv.org/pdf/2305.00729.pdf

https://arxiv.org/abs/2305.00729

https://twitter.com/_akhaliq/status/1653419675919761410/photo/1

Learning to Reason and Memorize with Self-Notes

Language models

Artificial intelligence

Machine learning

Natural language processing

Question answering

Text generation

A simple method for solving limited context memory and multi-step reasoning problems in large language models by allowing the model to take Self-Notes.

Recommendations for improving the memory and reasoning capabilities of large language models.

https://arxiv.org/pdf/2305.00833.pdf

https://arxiv.org/abs/2305.00833

https://twitter.com/_akhaliq/status/1653420949696397318/photo/1

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Causality

Artificial intelligence

Machine learning

Medicine

Science

Law

Policy

Investigation of the causal capabilities of large language models and their implications for societally impactful domains such as medicine, science, law, and policy.

Insights into the causal reasoning capabilities of large language models and their potential impact on various domains.

https://arxiv.org/pdf/2305.00050.pdf

https://arxiv.org/abs/2305.00050

https://twitter.com/_akhaliq/status/1653420630816051201/photo/1

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

scene generation

virtual reality

generative AI

virtual reality

scene generation and editing tasks

metaverse

gaming simulation

This study presents an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains or scenarios for scene understanding and generation in the physical or virtual world. The approach leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments, which is validated in scene generation and editing tasks. The potential benefit of incorporating ArK in generative AI for applications such as metaverse and gaming simulation is demonstrated.

ArK can significantly improve the quality of generated 2D/3D scenes, making it a valuable addition to applications such as metaverse and gaming simulation.

https://arxiv.org/pdf/2305.00970.pdf

https://arxiv.org/abs/2305.00970

https://augmented-reality-knowledge.github.io

https://twitter.com/_akhaliq/status/1653420205853274114/photo/1

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

NeRF-based method

digital human

metaverse

real-time talking face generation

The paper presents GeneFace++, a NeRF-based method that achieves stable and real-time talking face generation with generalized audio-lip synchronization. GeneFace++ improves on lip synchronization, video quality, and system efficiency through the utilization of pitch contour as an auxiliary feature, landmark locally linear embedding method to regulate outliers in predicted motion sequence, and efficient NeRF-based motion-to-video renderer. The method outperforms state-of-the-art baselines in terms of subjective and objective evaluation.

GeneFace++ can be used for real-time talking face generation with generalized audio-lip synchronization, improving on previous methods in terms of lip synchronization, video quality, and system efficiency.

https://arxiv.org/pdf/2305.00787.pdf

https://arxiv.org/abs/2305.00787

https://genefaceplusplus.github.io

https://twitter.com/_akhaliq/status/1653419461234376705/photo/1