Base Dataset Classes
Here are the two base classes for MusPy datasets.
- class muspy.Dataset[source]
Base class for MusPy datasets.
To build a custom dataset, it should inherit this class and overide the methods
__getitem__
and__len__
as well as the class attribute_info
.__getitem__
should return thei
-th data sample as amuspy.Music
object.__len__
should return the size of the dataset._info
should be amuspy.DatasetInfo
instance storing the dataset information.- classmethod info()[source]
Return the dataset infomation.
- classmethod citation()[source]
Print the citation infomation.
- save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)[source]
Save all the music objects to a directory.
- Parameters
root (str or Path) – Root directory to save the data.
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
verbose (bool, default: True) – Whether to be verbose.
**kwargs – Keyword arguments to pass to
muspy.save()
.
- split(filename=None, splits=None, random_state=None)[source]
Return the dataset as a PyTorch dataset.
- Parameters
filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)[source]
Return the dataset as a PyTorch dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
Converted PyTorch dataset(s).
- Return type
class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
- to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)[source]
Return the dataset as a TensorFlow dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
class:tensorflow.data.Dataset` or Dict of
class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
- class muspy.RemoteDataset(root, download_and_extract=False, overwrite=False, cleanup=False, verbose=True)[source]
Base class for remote MusPy datasets.
This class extends
muspy.Dataset
to support remote datasets. To build a custom remote dataset, please refer to the documentation ofmuspy.Dataset
for details. In addition, set the class attribute_sources
to the URLs to the source files (see Notes).- root
Root directory of the dataset.
- Type
str or Path
- Parameters
- Raises
RuntimeError: – If
download_and_extract
is False but file{root}/.muspy.success
does not exist (see below).
Important
muspy.Dataset.exists()
depends solely on a special file named.muspy.success
in directory{root}/_converted/
. This file serves as an indicator for the existence and integrity of the dataset. It will automatically be created if the dataset is successfully downloaded and extracted bymuspy.Dataset.download_and_extract()
. If the dataset is downloaded manually, make sure to create the.muspy.success
file in directory{root}/_converted/
to prevent errors.Notes
The class attribute
_sources
is a dictionary storing the following information of each source file.filename (str): Name to save the file.
url (str): URL to the file.
archive (bool): Whether the file is an archive.
md5 (str, optional): Expected MD5 checksum of the file.
sha256 (str, optional): Expected SHA256 checksum of the file.
Here is an example.:
_sources = { "example": { "filename": "example.tar.gz", "url": "https://www.example.com/example.tar.gz", "archive": True, "md5": None, "sha256": None, } }
See also
muspy.Dataset
Base class for MusPy datasets.
- exists()[source]
Return True if the dataset exists, otherwise False.
- source_exists()[source]
Return True if all the sources exist, otherwise False.
- download(overwrite=False, verbose=True)[source]
Download the dataset source(s).
- extract(cleanup=False, verbose=True)[source]
Extract the downloaded archive(s).
- download_and_extract(overwrite=False, cleanup=False, verbose=True)[source]
Download source datasets and extract the downloaded archives.
- classmethod citation()
Print the citation infomation.
- classmethod info()
Return the dataset infomation.
- save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)
Save all the music objects to a directory.
- Parameters
root (str or Path) – Root directory to save the data.
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
verbose (bool, default: True) – Whether to be verbose.
**kwargs – Keyword arguments to pass to
muspy.save()
.
- split(filename=None, splits=None, random_state=None)
Return the dataset as a PyTorch dataset.
- Parameters
filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
Return the dataset as a PyTorch dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
Converted PyTorch dataset(s).
- Return type
class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
- to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
Return the dataset as a TensorFlow dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
class:tensorflow.data.Dataset` or Dict of
class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).