CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, and Julian McAuley
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
paper
demo
video
slides
reviews
Oral presentation Audio Synthesis Multimodal Learning
CLIPSynth: Learning Text-to-audio Synthesis from Videos using CLIP and Diffusion Models
Hao-Wen Dong, Gunnar A. Sigurdsson, Chenyang Tao, Jiun-Yu Kao, Yu-Hsiang Lin, Anjali Narayan-Chen, Arpit Gupta, Tagyoung Chung, Jing Huang, Nanyun Peng, and Wenbo Zhao
CVPR Workshop on Sight and Sound (WSS), 2023
paper
demo
video
slides
Audio Synthesis Multimodal Learning
Deep Performer: Score-to-Audio Music Performance Synthesis
Hao-Wen Dong, Cong Zhou, Taylor Berg-Kirkpatrick, and Julian McAuley
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
paper
demo
video
slides
poster
reviews
Audio Synthesis Music Performance Rendering