Tue May 02 2023
Mon May 01 2023

What Do Self-Supervised Vision Transformers Learn?

Self-supervised learning
Computer vision
Image classification

Comparative study on how and why contrastive learning and masked image modeling differ in their representations and in their performance of downstream tasks. Demonstrates that self-supervised Vision Transformers can complement each other and harmonization can help leverage the advantages of both methods.

Recommend exploring the use of self-supervised Vision Transformers in image classification tasks. Consider experimenting with both contrastive learning and masked image modeling to achieve the best results.

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Neural radiance field (NeRF)
Computer graphics
Video production

Proposes GeneFace++ to achieve stable and real-time talking face generation with generalized audio-lip synchronization. Utilizes pitch contour as an auxiliary feature and introduces a temporal loss in the facial motion prediction process. Outperforms state-of-the-art baselines in terms of subjective and objective evaluation.

Consider using GeneFace++ for generating talking person portraits with arbitrary speech audio. GeneFace++ can help improve video quality, achieve high system efficiency, and handle lip synchronization better than other methods.

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Large language models
Natural language processing
Causal analysis

Investigates the causal capabilities of large language models (LLMs) on various tasks. Shows that LLMs bring capabilities so far understood to be restricted to humans and can perform tasks while relying on sources of knowledge and methods distinct from and complementary to non-LLM based approaches. Envisions LLMs to be used alongside existing causal methods.

Consider using LLMs alongside existing causal methods to reduce human effort in setting up a causal analysis. LLMs can be used as proxies for human domain knowledge and to formalize, validate, and communicate reasoning in high-stakes scenarios.

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

Augmented Reality
Virtual Reality
Generative AI
Scene generation
scene generation and editing tasks

Developed a mechanism, ArK, which leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments. Validated the effectiveness of ArK on scene generation and editing tasks, demonstrating potential benefits in generative AI for applications such as metaverse and gaming simulation.

Implement ArK in generative AI for applications such as metaverse and gaming simulation.

Poisoning Language Models During Instruction Tuning

Instruction-tuned language models
Natural language processing
Language model security
Natural language processing
Instruction-tuned language models

Showed that adversaries can contribute poison examples to datasets used to finetune instruction-tuned LMs. By using as few as 100 poison examples, they can cause arbitrary phrases to have consistent negative polarity or induce degenerate outputs across hundreds of tasks. Larger LMs are increasingly vulnerable to poisoning, and data filtering or reducing model capacity provide only moderate protections while reducing test accuracy.

Implement stronger defenses against poisoning attacks in instruction-tuned LMs to prevent manipulation of model predictions.

Sun Apr 30 2023
Thu Apr 27 2023
Wed Apr 26 2023
Tue Apr 25 2023