Sun May 14 2023 - Top Trending AI Papers

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

NLP

AI model architecture

Sequence modeling

Language modeling

Density estimation

Audio modeling

The paper proposes Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. This allows byte-level models to perform competitively with subword models on long context language modeling, achieve state-of-the-art density estimation on ImageNet, and model audio from raw files.

Implement Megabyte to improve language modeling, density estimation, and audio modeling.

https://arxiv.org/pdf/2305.07185.pdf

https://arxiv.org/abs/2305.07185

https://huggingface.co/papers/2305.07185

https://twitter.com/_akhaliq/status/1657919671092539394/photo/1

HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation

Computer graphics

Parametric modeling

Biomechanics

Animation

HACK is a novel parametric model for constructing the head and cervical region of digital humans. The model seeks to disentangle the full spectrum of neck and larynx motions, facial expressions, and appearance variations. HACK provides personalized and anatomically consistent controls, particularly for the neck regions, offering more accurate and expressive controls. This approach has significant benefits for numerous applications and enables inter-correlation analysis between head and neck for fine-grained motion synthesis and transfer.

Use HACK to create high-fidelity animations with anatomically consistent controls for the head and neck regions.

https://arxiv.org/pdf/2305.04469.pdf

https://arxiv.org/abs/2305.04469

https://huggingface.co/papers/2305.04469

https://twitter.com/_akhaliq/status/1657806182306676737/video/1

ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4

Natural language processing

Multimodal learning

Vision-language understanding

Image generation

Web development

ArtGPT-4 is a multimodal model trained on image-text pairs using a Tesla A100 device in just 2 hours, using only about 200 GB of data. The model can depict images with an artistic flair and generate visual code, including aesthetically pleasing HTML/CSS web pages. The article proposes novel benchmarks for evaluating the performance of vision-language models, and ArtGPT-4 scored higher than the current model and was only slightly worse than artists on the 6-point scale.

Implement ArtGPT-4 to generate images with an artistic flair and visually pleasing web pages.

https://arxiv.org/pdf/2305.07490.pdf

https://arxiv.org/abs/2305.07490

https://huggingface.co/papers/2305.07490

https://huggingface.co/Tyrannosaurus/ArtGPT-4

https://twitter.com/_akhaliq/status/1657917925456502784/photo/1

Universal Source Separation with Weakly Labelled Data

Computational auditory scene analysis

Audio analysis

Sound processing

Music source separation

Sound event separation

Speech enhancement

This paper proposes a universal audio source separation framework that uses weakly labeled audio data to separate arbitrary sound sources via a single model. The proposed system achieved significant improvements in separating a wide variety of sound classes, including sound event separation, music source separation, and speech enhancement.

Implementing this framework can significantly improve audio analysis and processing in various industries, including music, entertainment, and security.

https://arxiv.org/pdf/2305.07447.pdf

https://arxiv.org/abs/2305.07447

https://huggingface.co/papers/2305.07447

https://github.com/bytedance/uss

https://twitter.com/_akhaliq/status/1657952982598557696/photo/1

Optimizing Memory Mapping Using Deep Reinforcement Learning

Reinforcement Learning

Resource scheduling

Memory mapping

Cloud computing

Machine learning acceleration

This paper introduces a Reinforcement Learning (RL) agent, mallocMuZero, to solve the memory mapping problem that occurs during compilation of machine learning programs. The proposed system outperformed the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads and improved the execution time of the recently published AlphaTensor matrix multiplication model.

Implementing this approach can significantly improve the resource scheduling and allocation in various industries, including cloud computing and machine learning acceleration.

https://arxiv.org/pdf/2305.07440.pdf

https://arxiv.org/abs/2305.07440

https://huggingface.co/papers/2305.07440

https://twitter.com/_akhaliq/status/1657992287002017793/photo/1