BinaryMuseGAN

Hao-Wen Dong, Yi-Hsuan Yang

Music and AI Lab,
Research Center for IT Innovation,
Academia Sinica

Results

Qualitative Results

Note that only the guitar track is shown.

Strategy Result
raw prediction of
the pretrained G
closeup_raw
Bernoulli sampling
(at test time)
closeup_test_time_bernoulli
hard thresholding
(at test time)
closeup_test_time_round
proposed model
with SBNs
closeup_bernoulli
proposed model
with DBNs
closeup_round

Quantitative Results

Evaluation metrics

Comparisons of training and binarization strategies

two-stage_polyphonicity

two-stage_qualified_note_rate

end2end_qualified_note_rate

Effects of the shared/private and multi-stream design of the discriminator

ablated_qualified_note_rate

ablated_tonal_distance

Audio Samples

The audio samples presented below are in a high temporal resolution of 24 time steps per beat. Hence, they are not directly comparable to the results presented on the MuseGAN website.

No cherry-picking. Some might sound unpleasant. Lower the volume first!

Model Result
hard thresholding (at test time)
Bernoulli sampling (at test time)
proposed model (+SBNs)
proposed model (+DBNs)
end-to-end model (+SBNs)
end-to-end model (+DBNs)

Reference

  1. Christopher Harte, Mark Sandler, and Martin Gasser, “Detecting Harmonic Change In Musical Audio,” in Proc. ACM MM Workshop on Audio and Music Computing Multimedia, 2006.