Wed Mar 01 2023 - Top Trending AI Papers

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

forced alignment

speech recognition

natural language processing

speech transcription

captioning videos

meeting minutes

Improves transcription quality and enables a 12x transcription speedup via batched inference.

WhisperX can refine the timestamps of openAI's Whisper model via forced alignment with phoneme-based ASR models, enabling more accurate and faster speech transcription. This can be useful for businesses that rely on transcriptions for various purposes, such as captioning videos, generating meeting minutes, or analyzing customer feedback.

https://github.com/m-bain/whisperX

https://arxiv.org/pdf/2303.00747.pdf

https://arxiv.org/abs/2303.00747

https://twitter.com/arankomatsuzaki/status/1631108187113005056/photo/1

StraIT: Non-autoregressive Generation with Stratified Image Transformer

non-autoregressive models

generative models

image synthesis

image generation

graphic design

marketing

Significantly improves non-AR generation and outperforms existing diffusion models and AR methods while being order-of-magnitude faster.

StraIT is a generative model that can synthesize high-quality images faster and with fewer constraints than existing autoregressive and diffusion models, making it a potential tool for businesses that rely on image generation, such as graphic design or marketing agencies. Its decoupled modeling process also allows for domain transfer, providing more flexibility in image synthesis.

https://arxiv.org/pdf/2303.00750.pdf

https://arxiv.org/abs/2303.00750

https://twitter.com/arankomatsuzaki/status/1631108861343182849/photo/1

Unlimited-Size Diffusion Restoration

zero-shot learning

diffusion models

image processing

image restoration

image generation

media companies

e-commerce platforms

Can be used not only for image restoration but also for image generation of unlimited sizes, with the potential to be a general tool for diffusion models.

This research proposes a simple approach to use diffusion models for zero-shot image restoration and generation of unlimited sizes, which can be a useful tool for businesses that need to process large amounts of visual data, such as media companies or e-commerce platforms. The proposed Hierarchical Restoration and Mask-Shift Restoration techniques can maintain the excellent characteristics of zero-shot while alleviating local incoherence and out-of-domain issues.

https://github.com/wyhuai/DDNM

https://arxiv.org/pdf/2303.00354.pdf

https://arxiv.org/abs/2303.00354

https://twitter.com/arankomatsuzaki/status/1631107453395034113/photo/1