DiffMimic: Efficient Motion Mimicking with Differentiable Physics
Proposes an efficient motion mimicking method leveraging differentiable physics simulators (DPS) to improve physics-based character animation. DiffMimic has better sample efficiency and time efficiency than existing methods and allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training.
Can benefit animation systems with differentiable clothes simulation and improve physics-based character animation in business operations.
SegGPT: Segmenting Everything In Context
Presents SegGPT, a generalist model for segmenting various data types in context, including few-shot semantic segmentation, video object segmentation, semantic segmentation, and panoptic segmentation. SegGPT is evaluated on a broad range of tasks and showed strong capabilities in segmenting in-domain and out-of-domain targets.
Can improve segmentation tasks and image and video analysis in various business operations.
Instruction Tuning with GPT-4
Introduces the first attempt to use GPT-4 to generate instruction-following data for LLM finetuning. The data generated by GPT-4 leads to superior zero-shot performance on new tasks compared to the instruction-following data generated by previous state-of-the-art models.
Can improve large language model performance and enable zero-shot capabilities in various business operations.
DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model
Proposes DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image, outperforming state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D
Can be used to create high-quality 3D object models for business operations, such as product design and manufacturing
Diffusion Models as Masked Autoencoders
Revisits generatively pre-training visual representations in light of recent interest in denoising diffusion models, formulating diffusion models as masked autoencoders (DiffMAE) capable of serving as a strong initialization for downstream recognition tasks, conducting high-quality image inpainting, and being extended to video with state-of-the-art classification accuracy
Can be used to improve visual recognition tasks and image inpainting for businesses such as e-commerce or advertising