The Lakh Pianoroll Dataset (LPD) is a collection of 174,154 multitrack pianorolls derived from the Lakh MIDI Dataset (LMD).
We provide multiple subsets and versions of the dataset (see here). The dataset is available here.
The multitrack pianorolls in LPD are stored in a special format for efficient I/O and to save space. We recommend to load the data with Pypianoroll (The dataset is created using Pypianoroll v0.3.0.). See here to learn how the data is stored and how to load the data properly.
Lakh Pianoroll Dataset is a derivative of Lakh MIDI Dataset by Colin Raffel, used under CC BY 4.0. Lakh Pianoroll Dataset is licensed under CC BY 4.0 by Hao-Wen Dong and Wen-Yi Hsiao.
Please cite the following papers if you use Lakh Pianoroll Dataset in a published work.
Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang, “MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment,” in Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018.
Colin Raffel, “Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching,” PhD Thesis, 2016.