YaART pre-print is released
Today, we present the YaART: Yet Another ART Rendering Technology paper. YaART is our production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF).
In this study, we discuss our approach, highlighting the aspects of data selection, architecture design, and model training. We share the results of our investigations regarding the effect of model and training dataset sizes and comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion model training. We base our experiments on human evaluations, using DrawBench and our more challenging YaBasket prompt set, which we open for the research community via the project page.