muspy.datasets
Dataset classes.
This module provides an easy-to-use dataset management system. Each supported dataset in MusPy comes with a class inherited from the base MusPy Dataset class. It also provides interfaces to PyTorch and TensorFlow for creating input pipelines for machine learning.
Base Classes
- ABCFolderDataset 
- Dataset 
- DatasetInfo 
- FolderDataset 
- RemoteABCFolderDataset 
- RemoteDataset 
- RemoteFolderDataset 
- RemoteMusicDataset 
- MusicDataset 
Dataset Classes
- EssenFolkSongDatabase 
- EMOPIADataset 
- HaydnOp20Dataset 
- HymnalDataset 
- HymnalTuneDataset 
- JSBChoralesDataset 
- LakhMIDIAlignedDataset 
- LakhMIDIDataset 
- LakhMIDIMatchedDataset 
- MAESTRODatasetV1 
- MAESTRODatasetV2 
- Music21Dataset 
- MusicNetDataset 
- NESMusicDatabase 
- NottinghamDatabase 
- WikifoniaDataset 
- class muspy.datasets.ABCFolderDataset(root, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None)[source]
- Class for datasets storing ABC files in a folder. - See also - muspy.FolderDataset
- Class for datasets storing files in a folder. 
 
- class muspy.datasets.Dataset[source]
- Base class for MusPy datasets. - To build a custom dataset, it should inherit this class and overide the methods - __getitem__and- __len__as well as the class attribute- _info.- __getitem__should return the- i-th data sample as a- muspy.Musicobject.- __len__should return the size of the dataset.- _infoshould be a- muspy.DatasetInfoinstance storing the dataset information.- save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)[source]
- Save all the music objects to a directory. - Parameters
- root (str or Path) – Root directory to save the data. 
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data. 
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing. 
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted. 
- verbose (bool, default: True) – Whether to be verbose. 
- **kwargs – Keyword arguments to pass to - muspy.save().
 
 
 - split(filename=None, splits=None, random_state=None)[source]
- Return the dataset as a PyTorch dataset. - Parameters
- filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
 
 - to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)[source]
- Return the dataset as a PyTorch dataset. - Parameters
- factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor. 
- representation (str, optional) – Target representation. See - muspy.to_representation()for available representation.
- split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
- Returns
- Converted PyTorch dataset(s). 
- Return type
- class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset` 
 
 - to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)[source]
- Return the dataset as a TensorFlow dataset. - Parameters
- factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor. 
- representation (str, optional) – Target representation. See - muspy.to_representation()for available representation.
- split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
- Returns
- class:tensorflow.data.Dataset` or Dict of 
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s). 
 
 
 
- class muspy.datasets.DatasetInfo(name=None, description=None, homepage=None, license=None)[source]
- A container for dataset information. 
- class muspy.datasets.EMOPIADataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- EMOPIA Dataset. 
- class muspy.datasets.EssenFolkSongDatabase(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Essen Folk Song Database. 
- class muspy.datasets.FolderDataset(root, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None)[source]
- Class for datasets storing files in a folder. - This class extends - muspy.Datasetto support folder datasets. To build a custom folder dataset, please refer to the documentation of- muspy.Datasetfor details. In addition, set class attribute- _extensionto the extension to look for when building the dataset and set- readto a callable that takes as inputs a filename of a source file and return the converted Music object.- Parameters
- convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns. 
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data. 
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing. 
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted. 
- use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False. 
 
 - Important - muspy.FolderDataset.converted_exists()depends solely on a special file named- .muspy.successin the folder- {root}/_converted/, which serves as an indicator for the existence and integrity of the converted dataset. If the converted dataset is built by- muspy.FolderDataset.convert(), the- .muspy.successfile will be created as well. If the converted dataset is created manually, make sure to create the- .muspy.successfile in the folder- {root}/_converted/to prevent errors.- Notes - Two modes are available for this dataset. When the on-the-fly mode is enabled, a data sample is converted to a music object on the fly when being indexed. When the on-the-fly mode is disabled, a data sample is loaded from the precomputed converted data. - See also - muspy.Dataset
- Base class for MusPy datasets. 
 - property converted_dir
- Path to the root directory of the converted dataset. 
 - use_converted()[source]
- Disable on-the-fly mode and use converted data. - Returns
- Return type
- Object itself. 
 
 - on_the_fly()[source]
- Enable on-the-fly mode and convert the data on the fly. - Returns
- Return type
- Object itself. 
 
 - convert(kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)[source]
- Convert and save the Music objects. - The converted files will be named by its index and saved to - root/_converted. The original filenames can be found in the- filenamesattribute. For example, the file at- filenames[i]will be converted and saved to- {i}.json.- Parameters
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data. 
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing. 
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted. 
- verbose (bool, default: True) – Whether to be verbose. 
- **kwargs – Keyword arguments to pass to - muspy.save().
 
- Returns
- Return type
- Object itself. 
 
 
- class muspy.datasets.HaydnOp20Dataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Haydn Op.20 Dataset. 
- class muspy.datasets.HymnalDataset(root, download=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None)[source]
- Hymnal Dataset. 
- class muspy.datasets.HymnalTuneDataset(root, download=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None)[source]
- Hymnal Dataset (tune only). 
- class muspy.datasets.JSBChoralesDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Johann Sebastian Bach Chorales Dataset. 
- class muspy.datasets.LakhMIDIAlignedDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Lakh MIDI Dataset - aligned subset. 
- class muspy.datasets.LakhMIDIDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Lakh MIDI Dataset. 
- class muspy.datasets.LakhMIDIMatchedDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Lakh MIDI Dataset - matched subset. 
- class muspy.datasets.MAESTRODatasetV1(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- MAESTRO Dataset V1 (MIDI only). 
- class muspy.datasets.MAESTRODatasetV2(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- MAESTRO Dataset V2 (MIDI only). 
- class muspy.datasets.MAESTRODatasetV3(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- MAESTRO Dataset V3 (MIDI only). 
- class muspy.datasets.Music21Dataset(composer=None)[source]
- A class of datasets containing files in music21 corpus. - Parameters
- composer (str) – Name of a composer or a collection. Please refer to the music21 corpus reference page for a full list [1]. 
- extensions (list of str) – File extensions of desired files. 
 
 - References - [1] https://web.mit.edu/music21/doc/about/referenceCorpus.html - convert(root, kind='json', n_jobs=1, ignore_exceptions=True)[source]
- Convert and save the Music objects. - Parameters
- root (str or Path) – Root directory to save the data. 
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data. 
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing. 
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted. 
 
 
 
- class muspy.datasets.MusicDataset(root, kind=None)[source]
- Class for datasets of MusPy JSON/YAML files. - Parameters
- root (str or Path) – Root directory of the dataset. 
- kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files. 
 
 - root
- Root directory of the dataset. - Type
- Path 
 
 - filenames
- Path to the files, relative to root. - Type
- list of Path 
 
 - See also - muspy.Dataset
- Base class for MusPy datasets. 
 
- class muspy.datasets.MusicNetDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- MusicNet Dataset (MIDI only). 
- class muspy.datasets.NESMusicDatabase(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- NES Music Database. 
- class muspy.datasets.NottinghamDatabase(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Nottingham Database. 
- class muspy.datasets.RemoteABCFolderDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Base class for remote datasets storing ABC files in a folder. - See also - muspy.ABCFolderDataset
- Class for datasets storing ABC files in a folder. 
- muspy.RemoteDataset
- Base class for remote MusPy datasets. 
 
- class muspy.datasets.RemoteDataset(root, download_and_extract=False, overwrite=False, cleanup=False, verbose=True)[source]
- Base class for remote MusPy datasets. - This class extends - muspy.Datasetto support remote datasets. To build a custom remote dataset, please refer to the documentation of- muspy.Datasetfor details. In addition, set the class attribute- _sourcesto the URLs to the source files (see Notes).- Parameters
- Raises
- RuntimeError: – If - download_and_extractis False but file- {root}/.muspy.successdoes not exist (see below).
 - Important - muspy.Dataset.exists()depends solely on a special file named- .muspy.successin directory- {root}/_converted/. This file serves as an indicator for the existence and integrity of the dataset. It will automatically be created if the dataset is successfully downloaded and extracted by- muspy.Dataset.download_and_extract(). If the dataset is downloaded manually, make sure to create the- .muspy.successfile in directory- {root}/_converted/to prevent errors.- Notes - The class attribute - _sourcesis a dictionary storing the following information of each source file.- filename (str): Name to save the file. 
- url (str): URL to the file. 
- archive (bool): Whether the file is an archive. 
- md5 (str, optional): Expected MD5 checksum of the file. 
- sha256 (str, optional): Expected SHA256 checksum of the file. 
 - Here is an example.: - _sources = { "example": { "filename": "example.tar.gz", "url": "https://www.example.com/example.tar.gz", "archive": True, "md5": None, "sha256": None, } } - See also - muspy.Dataset
- Base class for MusPy datasets. 
 
- class muspy.datasets.RemoteFolderDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Base class for remote datasets storing files in a folder. - Parameters
- download_and_extract (bool, default: False) – Whether to download and extract the dataset. 
- cleanup (bool, default: False) – Whether to remove the source archive(s). 
- convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns. 
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data. 
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing. 
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted. 
- use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False. 
 
 - See also - muspy.FolderDataset
- Class for datasets storing files in a folder. 
- muspy.RemoteDataset
- Base class for remote MusPy datasets. 
 
- class muspy.datasets.RemoteMusicDataset(root, download_and_extract=False, overwrite=False, cleanup=False, kind=None, verbose=True)[source]
- Base class for remote datasets of MusPy JSON/YAML files. - Parameters
- root (str or Path) – Root directory of the dataset. 
- download_and_extract (bool, default: False) – Whether to download and extract the dataset. 
- overwrite (bool, default: False) – Whether to overwrite existing file(s). 
- cleanup (bool, default: False) – Whether to remove the source archive(s). 
- kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files. 
- verbose (bool. default: True) – Whether to be verbose. 
 
 - root
- Root directory of the dataset. - Type
- Path 
 
 - filenames
- Path to the files, relative to root. - Type
- list of Path 
 
 - See also - muspy.MusicDataset
- Class for datasets of MusPy JSON/YAML files. 
- muspy.RemoteDataset
- Base class for remote MusPy datasets. 
 
- class muspy.datasets.WikifoniaDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]
- Wikifonia dataset.