Thu Mar 02 2023
Wed Mar 01 2023

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

forced alignment
speech recognition
natural language processing
speech transcription
captioning videos
meeting minutes

Improves transcription quality and enables a 12x transcription speedup via batched inference.

WhisperX can refine the timestamps of openAI's Whisper model via forced alignment with phoneme-based ASR models, enabling more accurate and faster speech transcription. This can be useful for businesses that rely on transcriptions for various purposes, such as captioning videos, generating meeting minutes, or analyzing customer feedback.

StraIT: Non-autoregressive Generation with Stratified Image Transformer

non-autoregressive models
generative models
image synthesis
image generation
graphic design
marketing

Significantly improves non-AR generation and outperforms existing diffusion models and AR methods while being order-of-magnitude faster.

StraIT is a generative model that can synthesize high-quality images faster and with fewer constraints than existing autoregressive and diffusion models, making it a potential tool for businesses that rely on image generation, such as graphic design or marketing agencies. Its decoupled modeling process also allows for domain transfer, providing more flexibility in image synthesis.

Unlimited-Size Diffusion Restoration

zero-shot learning
diffusion models
image processing
image restoration
image generation
media companies
e-commerce platforms

Can be used not only for image restoration but also for image generation of unlimited sizes, with the potential to be a general tool for diffusion models.

This research proposes a simple approach to use diffusion models for zero-shot image restoration and generation of unlimited sizes, which can be a useful tool for businesses that need to process large amounts of visual data, such as media companies or e-commerce platforms. The proposed Hierarchical Restoration and Mask-Shift Restoration techniques can maintain the excellent characteristics of zero-shot while alleviating local incoherence and out-of-domain issues.

Tue Feb 28 2023
Mon Feb 27 2023
Sun Feb 26 2023
Thu Feb 23 2023