TeaserGen: Generating Teasers for Long Documentaries
Weihan Xu, Paul Pu Liang, Haven Kim, Julian McAuley, Taylor Berg-Kirkpatrick, and Hao-Wen Dong
International Conference on Learning Representations (ICLR), 2025
paper
demo
code
reviews
Multimodal Learning
FUTGA-MIR: Enhancing Fine-grained and Temporally-aware Music Understanding with Music Information Retrieval
Junda Wu, Zachary Novack, Amit Namburi, Hao-Wen Dong, Caron Chen, Zhouhang Xie, Jiaheng Dai, and Julian McAuley
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
paper
demo
reviews
Multimodal Learning
FUTGA: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, and Julian McAuley
Workshop on NLP for Music and Audio (NLP4MusA), 2024
paper
demo
Multimodal Learning
Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset
Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov, and Hao-Wen Dong
Under review, 2024
paper
Music Generation Multimodal Learning
A New Dataset for Tag- and Text-based Controllable Symbolic Music Generation
Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov, and Hao-Wen Dong
ISMIR Late-Breaking Demos, 2024
paper
demo
Music Generation Multimodal Learning
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, and Julian McAuley
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
paper
demo
video
slides
reviews
Oral presentation Audio Synthesis Multimodal Learning
CLIPSynth: Learning Text-to-audio Synthesis from Videos using CLIP and Diffusion Models
Hao-Wen Dong, Gunnar A. Sigurdsson, Chenyang Tao, Jiun-Yu Kao, Yu-Hsiang Lin, Anjali Narayan-Chen, Arpit Gupta, Tagyoung Chung, Jing Huang, Nanyun Peng, and Wenbo Zhao
CVPR Workshop on Sight and Sound (WSS), 2023
paper
demo
video
slides
Audio Synthesis Multimodal Learning
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, and Taylor Berg-Kirkpatrick
International Conference on Learning Representations (ICLR), 2023
paper
demo
video
slides
poster
code
reviews
Sound Separation Multimodal Learning