Tue May 09 2023
Mon May 08 2023

Multi-Space Neural Radiance Fields

Neural Networks
Computer Vision
Rendering high-quality scenes with complex light paths through mirror-like objects.

Proposes a multi-space neural radiance field (MS-NeRF) that represents the scene using a group of feature fields in parallel sub-spaces, which leads to a better understanding of the neural network towards the existence of reflective and refractive objects. Outperforms single-space NeRF methods for rendering high-quality scenes concerned with complex light paths through mirror-like objects.

Can significantly improve the quality of rendering scenes with complex light paths through reflective and refractive objects.

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Neural Networks
Natural Language Processing
Computer Vision
Conducting multi-round dialogue with humans.
Generating captions and answering general questions from users.

Presents a vision and language model named MultiModal-GPT to conduct multi-round dialogue with humans. Employs instructions from humans such as generating captions and answering general questions. The model is fine-tuned from OpenFlamingo with Low-rank Adapter (LoRA) added to the cross-attention part and the self-attention part of the language model. Demonstrates improved dialogue performance through joint training of language-only and visual-language instructions.

Can improve dialogue performance and ability to chat with humans through joint training of visual and language instructions.

Locally Attentional SDF Diffusion for Controllable 3D Shape Generation

Neural Networks
2D Image Processing
3D Modeling
Computer Vision
Generating plausible and diverse 3D shapes via 2D sketch image input.
Improving local controllability and model generalizability.

Proposes locally attentional SDF diffusion, a diffusion-based 3D generation framework, to model plausible 3D shapes, via 2D sketch image input. Employs a view-aware local attention mechanism for image-conditioned shape generation, which takes advantage of 2D image patch features to guide 3D voxel feature learning, improving local controllability and model generalizability. Validated through extensive experiments in sketch-conditioned and category-conditioned 3D shape generation tasks.

Can provide plausible and diverse 3D shapes with superior controllability and generalizability over existing work.

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens

Transformer models
Machine learning
Natural language processing
Natural language processing
Computer vision
Efficient processing of ultra-long sequences

Proposes to significantly reduce the long attention cost by compressing the input into a fixed-size set of vectors at each layer.

Can significantly reduce computational cost and improve efficiency of transformer models used in natural language processing and computer vision, especially for ultra-long sequences. Can also improve performance on a large number of tasks while offering accuracy improvement.

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Large language models
Chain-of-thought reasoning
Machine learning
Natural language processing
Language processing tasks
Explainable AI
Model interpretation

Demonstrates that chain-of-thought explanations in language models can be heavily influenced by biased features in model inputs and can systematically misrepresent the true reason for a model's prediction.

Highlights the need for targeted efforts to evaluate and improve explanation faithfulness in large language models, especially for chain-of-thought explanations. Raises concerns about the trustworthiness and safety of these models.

Sun May 07 2023
Thu May 04 2023
Thu May 04 2023
Wed May 03 2023