TeaserGen: Generating Teasers for Long Documentaries
Weihan Xu, Paul Pu Liang, Haven Kim, Julian McAuley, Taylor Berg-Kirkpatrick, and Hao-Wen Dong
Under review, 2024
paper
demo
Multimodal Learning
Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset
Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov, and Hao-Wen Dong
Under review, 2024
paper
Music Generation Multimodal Learning
A New Dataset for Tag- and Text-based Controllable Symbolic Music Generation
Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov, and Hao-Wen Dong
ISMIR Late-Breaking Demos, 2024
paper
demo
Music Generation Multimodal Learning
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, and Julian McAuley
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
paper
demo
video
slides
reviews
Oral presentation Audio Synthesis Multimodal Learning
CLIPSynth: Learning Text-to-audio Synthesis from Videos using CLIP and Diffusion Models
Hao-Wen Dong, Gunnar A. Sigurdsson, Chenyang Tao, Jiun-Yu Kao, Yu-Hsiang Lin, Anjali Narayan-Chen, Arpit Gupta, Tagyoung Chung, Jing Huang, Nanyun Peng, and Wenbo Zhao
CVPR Workshop on Sight and Sound (WSS), 2023
paper
demo
video
slides
Audio Synthesis Multimodal Learning
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, and Taylor Berg-Kirkpatrick
International Conference on Learning Representations (ICLR), 2023
paper
demo
video
slides
poster
code
reviews
Sound Separation Multimodal Learning