CMT reduces the training cost of diffusion-based flow map models by up to 90% while reaching SOTA performance
A framework for Identify which training examples influenced specific concepts within the diffusion model
Learning conditional, unconditional, and matching-aware discriminator with adaptive weighting mechanism (cSAN)
Propose tensor-decomposition-based PEFT method, showing its effectiveness on T-to-I generation tasks
Theoretical analysis of limitation of current discrete diffusion and a method for effectively capturing element-wise dependency
A method efficiently leverages online human feedback to fine-tune Stable Diffusion for various range of tasks
An enhanced multimodal representation using weighted point clouds and its theoretical benefits
A 64x64 pre-trained diffusion model is all you need for 1-step high-resolution SOTA generation
Unified framework enables diverse samplers and 1-step generation SOTAs
Applications:
[SoundGen]
DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
[EMNLP] [arXiv] [data]
CARE: Assessing the Impact of Multilingual Human Preference Learning on Cultural Awareness
[MRR@ICCV25] [arXiv]
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
[TMLR] [arXiv]
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Large-Scale Training Data Attribution for Music Generative Models via Unlearning
SOTA Fx representation: Extract instrument-wise audio effects representations from music mixtures
Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Supervised contrastive learning from weakly-labeled audio segments for musical version matching
DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions
Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data
DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
































































