What Do Self-Supervised Vision Transformers Learn?
Comparative study on how and why contrastive learning and masked image modeling differ in their representations and in their performance of downstream tasks. Demonstrates that self-supervised Vision Transformers can complement each other and harmonization can help leverage the advantages of both methods.
Recommend exploring the use of self-supervised Vision Transformers in image classification tasks. Consider experimenting with both contrastive learning and masked image modeling to achieve the best results.
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Proposes GeneFace++ to achieve stable and real-time talking face generation with generalized audio-lip synchronization. Utilizes pitch contour as an auxiliary feature and introduces a temporal loss in the facial motion prediction process. Outperforms state-of-the-art baselines in terms of subjective and objective evaluation.
Consider using GeneFace++ for generating talking person portraits with arbitrary speech audio. GeneFace++ can help improve video quality, achieve high system efficiency, and handle lip synchronization better than other methods.
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Investigates the causal capabilities of large language models (LLMs) on various tasks. Shows that LLMs bring capabilities so far understood to be restricted to humans and can perform tasks while relying on sources of knowledge and methods distinct from and complementary to non-LLM based approaches. Envisions LLMs to be used alongside existing causal methods.
Consider using LLMs alongside existing causal methods to reduce human effort in setting up a causal analysis. LLMs can be used as proxies for human domain knowledge and to formalize, validate, and communicate reasoning in high-stakes scenarios.
ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Developed a mechanism, ArK, which leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments. Validated the effectiveness of ArK on scene generation and editing tasks, demonstrating potential benefits in generative AI for applications such as metaverse and gaming simulation.
Implement ArK in generative AI for applications such as metaverse and gaming simulation.
Poisoning Language Models During Instruction Tuning
Showed that adversaries can contribute poison examples to datasets used to finetune instruction-tuned LMs. By using as few as 100 poison examples, they can cause arbitrary phrases to have consistent negative polarity or induce degenerate outputs across hundreds of tasks. Larger LMs are increasingly vulnerable to poisoning, and data filtering or reducing model capacity provide only moderate protections while reducing test accuracy.
Implement stronger defenses against poisoning attacks in instruction-tuned LMs to prevent manipulation of model predictions.