Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Proposes EvalPlus -- a code synthesis benchmarking framework to rigorously evaluate the functional correctness of LLM-synthesized code and found that popular code synthesis evaluation results do not accurately reflect the true performance of LLMs for code synthesis.
EvalPlus can help in providing a rigorous evaluation of large language models for code generation to ensure the generated code is functionally correct.
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Introduces Pick-a-Pic, an open dataset of text-to-image prompts and real user preferences over generated images that can be used to train a CLIP-based scoring function (PickScore) that exhibits superhuman performance on the task of predicting human preferences.
PickScore can be used for evaluating future text-to-image generation models and the Pick-a-Pic dataset can be used as a more relevant dataset than MS-COCO.
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling
Introduces DreamPaint, a framework to intelligently inpaint any e-commerce product on any user-provided context image without requiring 3D modeling of either the e-commerce product or the user context.
DreamPaint can be used for virtual try-on of products in e-commerce and can intelligently infer the best 3D angle of the product to place at the desired location on the user context.
Key-Locked Rank One Editing for Text-to-Image Personalization
Perfusion is a T2I personalization method that addresses multiple hard challenges using dynamic rank-1 updates to the underlying T2I model and introduces a new mechanism that 'locks' new concepts' cross-attention Keys to their superordinate category. It enables runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Perfusion outperforms strong baselines in both qualitative and quantitative terms.
Perfusion can help businesses improve their text-to-image personalization process by providing a more efficient and effective method that can be implemented with a smaller model. It can also improve the quality of personalized object interactions in unprecedented ways, even for one-shot settings.