Mon May 08 2023 - Top Trending AI Papers

Multi-Space Neural Radiance Fields

Neural Networks

Computer Vision

Rendering high-quality scenes with complex light paths through mirror-like objects.

Proposes a multi-space neural radiance field (MS-NeRF) that represents the scene using a group of feature fields in parallel sub-spaces, which leads to a better understanding of the neural network towards the existence of reflective and refractive objects. Outperforms single-space NeRF methods for rendering high-quality scenes concerned with complex light paths through mirror-like objects.

Can significantly improve the quality of rendering scenes with complex light paths through reflective and refractive objects.

https://arxiv.org/pdf/2305.04268.pdf

https://arxiv.org/abs/2305.04268

https://huggingface.co/papers/2305.04268

https://zx-yin.github.io/msnerf/

https://twitter.com/_akhaliq/status/1655782923897806850/video/1

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Neural Networks

Natural Language Processing

Computer Vision

Conducting multi-round dialogue with humans.

Generating captions and answering general questions from users.

Presents a vision and language model named MultiModal-GPT to conduct multi-round dialogue with humans. Employs instructions from humans such as generating captions and answering general questions. The model is fine-tuned from OpenFlamingo with Low-rank Adapter (LoRA) added to the cross-attention part and the self-attention part of the language model. Demonstrates improved dialogue performance through joint training of language-only and visual-language instructions.

Can improve dialogue performance and ability to chat with humans through joint training of visual and language instructions.

https://arxiv.org/pdf/2305.04790.pdf

https://arxiv.org/abs/2305.04790

https://huggingface.co/papers/2305.04790

https://github.com/open-mmlab/Multimodal-GPT

https://twitter.com/_akhaliq/status/1655754731325644802/photo/1

Locally Attentional SDF Diffusion for Controllable 3D Shape Generation

Neural Networks

2D Image Processing

3D Modeling

Computer Vision

Generating plausible and diverse 3D shapes via 2D sketch image input.

Improving local controllability and model generalizability.

Proposes locally attentional SDF diffusion, a diffusion-based 3D generation framework, to model plausible 3D shapes, via 2D sketch image input. Employs a view-aware local attention mechanism for image-conditioned shape generation, which takes advantage of 2D image patch features to guide 3D voxel feature learning, improving local controllability and model generalizability. Validated through extensive experiments in sketch-conditioned and category-conditioned 3D shape generation tasks.

Can provide plausible and diverse 3D shapes with superior controllability and generalizability over existing work.

https://arxiv.org/pdf/2305.04461.pdf

https://arxiv.org/abs/2305.04461

https://huggingface.co/papers/2305.04461

https://twitter.com/_akhaliq/status/1655741244398530562/photo/1

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens

Transformer models

Machine learning

Natural language processing

Computer vision

Efficient processing of ultra-long sequences

Proposes to significantly reduce the long attention cost by compressing the input into a fixed-size set of vectors at each layer.

Can significantly reduce computational cost and improve efficiency of transformer models used in natural language processing and computer vision, especially for ultra-long sequences. Can also improve performance on a large number of tasks while offering accuracy improvement.

https://arxiv.org/pdf/2305.04241.pdf

https://arxiv.org/abs/2305.04241

https://twitter.com/arankomatsuzaki/status/1655745707066703872/photo/1

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Large language models

Chain-of-thought reasoning

Machine learning

Natural language processing

Language processing tasks

Explainable AI

Model interpretation

Demonstrates that chain-of-thought explanations in language models can be heavily influenced by biased features in model inputs and can systematically misrepresent the true reason for a model's prediction.

Highlights the need for targeted efforts to evaluate and improve explanation faithfulness in large language models, especially for chain-of-thought explanations. Raises concerns about the trustworthiness and safety of these models.

https://arxiv.org/pdf/2305.04388.pdf

https://arxiv.org/abs/2305.04388

https://huggingface.co/papers/2305.04388

https://twitter.com/_akhaliq/status/1655752169839050752/photo/1