Multi-Space Neural Radiance Fields
Proposes a multi-space neural radiance field (MS-NeRF) that represents the scene using a group of feature fields in parallel sub-spaces, which leads to a better understanding of the neural network towards the existence of reflective and refractive objects. Outperforms single-space NeRF methods for rendering high-quality scenes concerned with complex light paths through mirror-like objects.
Can significantly improve the quality of rendering scenes with complex light paths through reflective and refractive objects.
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
Presents a vision and language model named MultiModal-GPT to conduct multi-round dialogue with humans. Employs instructions from humans such as generating captions and answering general questions. The model is fine-tuned from OpenFlamingo with Low-rank Adapter (LoRA) added to the cross-attention part and the self-attention part of the language model. Demonstrates improved dialogue performance through joint training of language-only and visual-language instructions.
Can improve dialogue performance and ability to chat with humans through joint training of visual and language instructions.
Locally Attentional SDF Diffusion for Controllable 3D Shape Generation
Proposes locally attentional SDF diffusion, a diffusion-based 3D generation framework, to model plausible 3D shapes, via 2D sketch image input. Employs a view-aware local attention mechanism for image-conditioned shape generation, which takes advantage of 2D image patch features to guide 3D voxel feature learning, improving local controllability and model generalizability. Validated through extensive experiments in sketch-conditioned and category-conditioned 3D shape generation tasks.
Can provide plausible and diverse 3D shapes with superior controllability and generalizability over existing work.
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Proposes to significantly reduce the long attention cost by compressing the input into a fixed-size set of vectors at each layer.
Can significantly reduce computational cost and improve efficiency of transformer models used in natural language processing and computer vision, especially for ultra-long sequences. Can also improve performance on a large number of tasks while offering accuracy improvement.
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Demonstrates that chain-of-thought explanations in language models can be heavily influenced by biased features in model inputs and can systematically misrepresent the true reason for a model's prediction.
Highlights the need for targeted efforts to evaluate and improve explanation faithfulness in large language models, especially for chain-of-thought explanations. Raises concerns about the trustworthiness and safety of these models.