Base Dataset Classes
Here are the two base classes for MusPy datasets.
- class muspy.Dataset[source]
- Base class for MusPy datasets. - To build a custom dataset, it should inherit this class and overide the methods - __getitem__and- __len__as well as the class attribute- _info.- __getitem__should return the- i-th data sample as a- muspy.Musicobject.- __len__should return the size of the dataset.- _infoshould be a- muspy.DatasetInfoinstance storing the dataset information.- classmethod info()[source]
- Return the dataset infomation. 
 - classmethod citation()[source]
- Print the citation infomation. 
 - save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)[source]
- Save all the music objects to a directory. - Parameters
- root (str or Path) – Root directory to save the data. 
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data. 
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing. 
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted. 
- verbose (bool, default: True) – Whether to be verbose. 
- **kwargs – Keyword arguments to pass to - muspy.save().
 
 
 - split(filename=None, splits=None, random_state=None)[source]
- Return the dataset as a PyTorch dataset. - Parameters
- filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
 
 - to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)[source]
- Return the dataset as a PyTorch dataset. - Parameters
- factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor. 
- representation (str, optional) – Target representation. See - muspy.to_representation()for available representation.
- split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
- Returns
- Converted PyTorch dataset(s). 
- Return type
- class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset` 
 
 - to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)[source]
- Return the dataset as a TensorFlow dataset. - Parameters
- factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor. 
- representation (str, optional) – Target representation. See - muspy.to_representation()for available representation.
- split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
- Returns
- class:tensorflow.data.Dataset` or Dict of 
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s). 
 
 
 
- class muspy.RemoteDataset(root, download_and_extract=False, overwrite=False, cleanup=False, verbose=True)[source]
- Base class for remote MusPy datasets. - This class extends - muspy.Datasetto support remote datasets. To build a custom remote dataset, please refer to the documentation of- muspy.Datasetfor details. In addition, set the class attribute- _sourcesto the URLs to the source files (see Notes).- root
- Root directory of the dataset. - Type
- str or Path 
 
 - Parameters
- Raises
- RuntimeError: – If - download_and_extractis False but file- {root}/.muspy.successdoes not exist (see below).
 - Important - muspy.Dataset.exists()depends solely on a special file named- .muspy.successin directory- {root}/_converted/. This file serves as an indicator for the existence and integrity of the dataset. It will automatically be created if the dataset is successfully downloaded and extracted by- muspy.Dataset.download_and_extract(). If the dataset is downloaded manually, make sure to create the- .muspy.successfile in directory- {root}/_converted/to prevent errors.- Notes - The class attribute - _sourcesis a dictionary storing the following information of each source file.- filename (str): Name to save the file. 
- url (str): URL to the file. 
- archive (bool): Whether the file is an archive. 
- md5 (str, optional): Expected MD5 checksum of the file. 
- sha256 (str, optional): Expected SHA256 checksum of the file. 
 - Here is an example.: - _sources = { "example": { "filename": "example.tar.gz", "url": "https://www.example.com/example.tar.gz", "archive": True, "md5": None, "sha256": None, } } - See also - muspy.Dataset
- Base class for MusPy datasets. 
 - exists()[source]
- Return True if the dataset exists, otherwise False. 
 - source_exists()[source]
- Return True if all the sources exist, otherwise False. 
 - download(overwrite=False, verbose=True)[source]
- Download the dataset source(s). 
 - extract(cleanup=False, verbose=True)[source]
- Extract the downloaded archive(s). 
 - download_and_extract(overwrite=False, cleanup=False, verbose=True)[source]
- Download source datasets and extract the downloaded archives. 
 - classmethod citation()
- Print the citation infomation. 
 - classmethod info()
- Return the dataset infomation. 
 - save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)
- Save all the music objects to a directory. - Parameters
- root (str or Path) – Root directory to save the data. 
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data. 
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing. 
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted. 
- verbose (bool, default: True) – Whether to be verbose. 
- **kwargs – Keyword arguments to pass to - muspy.save().
 
 
 - split(filename=None, splits=None, random_state=None)
- Return the dataset as a PyTorch dataset. - Parameters
- filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
 
 - to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
- Return the dataset as a PyTorch dataset. - Parameters
- factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor. 
- representation (str, optional) – Target representation. See - muspy.to_representation()for available representation.
- split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
- Returns
- Converted PyTorch dataset(s). 
- Return type
- class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset` 
 
 - to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
- Return the dataset as a TensorFlow dataset. - Parameters
- factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor. 
- representation (str, optional) – Target representation. See - muspy.to_representation()for available representation.
- split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split. 
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits. 
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to - numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
 
- Returns
- class:tensorflow.data.Dataset` or Dict of 
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).