Local Dataset Classes
Here are the classes for local datasets.
- class muspy.FolderDataset(root, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None)[source]
Class for datasets storing files in a folder.
This class extends
muspy.Dataset
to support folder datasets. To build a custom folder dataset, please refer to the documentation ofmuspy.Dataset
for details. In addition, set class attribute_extension
to the extension to look for when building the dataset and setread
to a callable that takes as inputs a filename of a source file and return the converted Music object.- root
Root directory of the dataset.
- Type
str or Path
- Parameters
convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns.
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False.
Important
muspy.FolderDataset.converted_exists()
depends solely on a special file named.muspy.success
in the folder{root}/_converted/
, which serves as an indicator for the existence and integrity of the converted dataset. If the converted dataset is built bymuspy.FolderDataset.convert()
, the.muspy.success
file will be created as well. If the converted dataset is created manually, make sure to create the.muspy.success
file in the folder{root}/_converted/
to prevent errors.Notes
Two modes are available for this dataset. When the on-the-fly mode is enabled, a data sample is converted to a music object on the fly when being indexed. When the on-the-fly mode is disabled, a data sample is loaded from the precomputed converted data.
See also
muspy.Dataset
Base class for MusPy datasets.
- property converted_dir
Path to the root directory of the converted dataset.
- read(filename)[source]
Read a file into a Music object.
- load(filename)[source]
Load a file into a Music object.
- exists()[source]
Return True if the dataset exists, otherwise False.
- converted_exists()[source]
Return True if the saved dataset exists, otherwise False.
- get_converted_filenames()[source]
Return a list of converted filenames.
- use_converted()[source]
Disable on-the-fly mode and use converted data.
- Returns
- Return type
Object itself.
- get_raw_filenames()[source]
Return a list of raw filenames.
- on_the_fly()[source]
Enable on-the-fly mode and convert the data on the fly.
- Returns
- Return type
Object itself.
- convert(kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)[source]
Convert and save the Music objects.
The converted files will be named by its index and saved to
root/_converted
. The original filenames can be found in thefilenames
attribute. For example, the file atfilenames[i]
will be converted and saved to{i}.json
.- Parameters
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
verbose (bool, default: True) – Whether to be verbose.
**kwargs – Keyword arguments to pass to
muspy.save()
.
- Returns
- Return type
Object itself.
- classmethod citation()
Print the citation infomation.
- classmethod info()
Return the dataset infomation.
- save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)
Save all the music objects to a directory.
- Parameters
root (str or Path) – Root directory to save the data.
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
verbose (bool, default: True) – Whether to be verbose.
**kwargs – Keyword arguments to pass to
muspy.save()
.
- split(filename=None, splits=None, random_state=None)
Return the dataset as a PyTorch dataset.
- Parameters
filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
Return the dataset as a PyTorch dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
Converted PyTorch dataset(s).
- Return type
class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
- to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
Return the dataset as a TensorFlow dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
class:tensorflow.data.Dataset` or Dict of
class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
- class muspy.MusicDataset(root, kind=None)[source]
Class for datasets of MusPy JSON/YAML files.
- Parameters
root (str or Path) – Root directory of the dataset.
kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files.
- root
Root directory of the dataset.
- Type
Path
- filenames
Path to the files, relative to root.
- Type
list of Path
See also
muspy.Dataset
Base class for MusPy datasets.
- classmethod citation()
Print the citation infomation.
- classmethod info()
Return the dataset infomation.
- save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)
Save all the music objects to a directory.
- Parameters
root (str or Path) – Root directory to save the data.
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
verbose (bool, default: True) – Whether to be verbose.
**kwargs – Keyword arguments to pass to
muspy.save()
.
- split(filename=None, splits=None, random_state=None)
Return the dataset as a PyTorch dataset.
- Parameters
filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
Return the dataset as a PyTorch dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
Converted PyTorch dataset(s).
- Return type
class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
- to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
Return the dataset as a TensorFlow dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
class:tensorflow.data.Dataset` or Dict of
class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
- class muspy.ABCFolderDataset(root, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None)[source]
Class for datasets storing ABC files in a folder.
See also
muspy.FolderDataset
Class for datasets storing files in a folder.
- read(filename)[source]
Read a file into a Music object.
- on_the_fly()[source]
Enable on-the-fly mode and convert the data on the fly.
- Returns
- Return type
Object itself.
- classmethod citation()
Print the citation infomation.
- convert(kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)
Convert and save the Music objects.
The converted files will be named by its index and saved to
root/_converted
. The original filenames can be found in thefilenames
attribute. For example, the file atfilenames[i]
will be converted and saved to{i}.json
.- Parameters
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
verbose (bool, default: True) – Whether to be verbose.
**kwargs – Keyword arguments to pass to
muspy.save()
.
- Returns
- Return type
Object itself.
- property converted_dir
Path to the root directory of the converted dataset.
- converted_exists()
Return True if the saved dataset exists, otherwise False.
- exists()
Return True if the dataset exists, otherwise False.
- get_converted_filenames()
Return a list of converted filenames.
- get_raw_filenames()
Return a list of raw filenames.
- classmethod info()
Return the dataset infomation.
- load(filename)
Load a file into a Music object.
- save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)
Save all the music objects to a directory.
- Parameters
root (str or Path) – Root directory to save the data.
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
verbose (bool, default: True) – Whether to be verbose.
**kwargs – Keyword arguments to pass to
muspy.save()
.
- split(filename=None, splits=None, random_state=None)
Return the dataset as a PyTorch dataset.
- Parameters
filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
Return the dataset as a PyTorch dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
Converted PyTorch dataset(s).
- Return type
class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
- to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)
Return the dataset as a TensorFlow dataset.
- Parameters
factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation.split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
- Returns
class:tensorflow.data.Dataset` or Dict of
class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
- use_converted()
Disable on-the-fly mode and use converted data.
- Returns
- Return type
Object itself.