Thu Dec 08 2022
Tue Dec 06 2022
Unifying Vision, Text, and Layout for Universal Document Processing
Natural Language Processing
Document AI
Document understanding and QA
Neural document editing
Content customization
Sets the SotA on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains. Ranks first on the Document Understanding Benchmark (DUE).
Can improve document understanding and QA across diverse data domains such as finance reports, academic papers, and websites. Also allows for high-quality neural document editing and content customization.
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Deep Learning
Computer Vision
Video understanding
Video action recognition/detection
Video-language alignment
Presents general video foundation models, InternVideo, by taking advantage of both generative and discriminative self-supervised video learning.
Can improve video understanding for various tasks including video action recognition/detection, video-language alignment, and open-world video applications.
Mon Dec 05 2022
Sun Dec 04 2022
Tue Nov 29 2022
Wed Nov 23 2022