ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection, w/ 20x less cost.
Businesses can leverage ChatGPT to efficiently and cost-effectively annotate data for NLP applications, such as training classifiers and evaluating unsupervised models.
PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
PAniC-3D proposes a system to reconstruct stylized 3D character heads directly from illustrated portraits of anime characters.
Businesses working in the media and entertainment industry, especially in anime production, can benefit from PAniC-3D to streamline their 3D character head reconstruction process, saving time and resources.
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
GestureDiffuCLIP is a neural network framework for synthesizing realistic, stylized co-speech gestures with flexible style control.
Businesses in the media and entertainment industry, especially those producing video content, can use GestureDiffuCLIP to add realistic and stylized gestures to their content, enhancing the viewer's experience.
Anti-DreamBooth: Protecting users from personalized text-to-image synthesis
Text-to-image diffusion models are nothing but a revolution, allowing anyone, even without design skills, to create realistic images from simple text inputs.
The system Anti-DreamBooth could be implemented by businesses that use or develop text-to-image models to protect their customers from malicious use of their technology. By adding subtle noise perturbation to each user's image before publishing, they can disrupt the generation quality of any DreamBooth model trained on these perturbed images and thus prevent the production of fake news or disturbing content that targets individual victims.
unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network
Large-scale data sets on scholarly publications are the basis for a variety of bibliometric analyses and natural language processing (NLP) applications.
Businesses that develop or use NLP applications could benefit from the new version of the unarXive dataset, which includes 1.9 million publications spanning multiple disciplines and 32 years, and provides a more complete citation network and a richer representation of document structure and non-textual content such as mathematical notation. The dataset also includes ready-to-use training/test data for citation recommendation and IMRaD classification, which could be applied to improve the accuracy and efficiency of NLP applications in businesses' operations.