Dataset
A Dataset is analogous to a file for an operating system and it
is contained within a Group.
A Dataset is essentially a numpy.ndarray with Metadata
and it can be accessed in read-only mode.
Since a Dataset can be thought of as an numpy.ndarray the attributes of
an numpy.ndarray are also valid for a Dataset. For example, suppose
my_dataset is a Dataset
>>> my_dataset
<Dataset '/my_dataset' shape=(5,) dtype='|V16' (2 metadata)>
>>> my_dataset.data
array([(0.23, 1.27), (1.86, 2.74), (3.44, 2.91), (5.91, 1.83), (8.73, 0.74)],
dtype=[('x', '<f8'), ('y', '<f8')])
You can get the numpy.ndarray.shape using
>>> my_dataset.shape
(5,)
or convert the data in the Dataset to a Python list,
using numpy.ndarray.tolist()
>>> my_dataset.tolist()
[(0.23, 1.27), (1.86, 2.74), (3.44, 2.91), (5.91, 1.83), (8.73, 0.74)]
To access the Metadata of a Dataset,
you call the metadata attribute
>>> my_dataset.metadata
<Metadata '/my_dataset' {'temperature': 20.13, 'humidity': 45.31}>
You can access values of the Metadata as attributes
>>> my_dataset.metadata.temperature
20.13
or as keys
>>> my_dataset.metadata['humidity']
45.31
Depending on the numpy.dtype that was used to create the underlying
numpy.ndarray for the Dataset the field names
can also be accessed as field attributes. For example, you can access the fields
in my_dataset as keys
>>> my_dataset['x']
array([0.23, 1.86, 3.44, 5.91, 8.73])
or as attributes
>>> my_dataset.x
array([0.23, 1.86, 3.44, 5.91, 8.73])
Note that the returned object is a numpy.ndarray and therefore does not
contain any Metadata.
See Accessing Keys as Class Attributes for more information.
You can also chain multiple attribute calls together. For example, to get the maximum x value in my_dataset you can use
>>> my_dataset.x.max()
8.73
Slicing and Indexing
Slicing and indexing a Dataset is a valid
operation, but returns a numpy.ndarray which does not contain
any Metadata.
Consider my_dataset from above. One can slice it
>>> my_dataset[::2]
array([(0.23, 1.27), (3.44, 2.91), (8.73, 0.74)],
dtype=[('x', '<f8'), ('y', '<f8')])
or index it
>>> my_dataset[2]
(3.44, 2.91)
Since a numpy.ndarray is returned, you are responsible for keeping
track of the Metadata in slicing and indexing operations.
For example,
>>> my_subset = root.create_dataset('my_subset', data=my_dataset[::2], **my_dataset.metadata)
>>> my_subset
<Dataset '/my_subset' shape=(3,) dtype='|V16' (2 metadata)>
>>> my_subset.data
array([(0.23, 1.27), (3.44, 2.91), (8.73, 0.74)],
dtype=[('x', '<f8'), ('y', '<f8')])
>>> my_subset.metadata
<Metadata '/my_subset' {'temperature': 20.13, 'humidity': 45.31}>
Arithmetic Operations
Arithmetic operations are valid with a Dataset, however,
the returned object will be a numpy.ndarray and therefore all
Metadata of the Datasets
that are involved in the operation are not included in the returned object.
For example, suppose you have two Datasets that
contain the following information
>>> dset1
<Dataset '/dset1' shape=(3,) dtype='<f8' (1 metadata)>
>>> dset1.data
array([1., 2., 3.])
>>> dset1.metadata
<Metadata '/dset1' {'temperature': 20.3}>
>>> dset2
<Dataset '/dset2' shape=(3,) dtype='<f8' (1 metadata)>
>>> dset2.data
array([4., 5., 6.])
>>> dset2.metadata
<Metadata '/dset2' {'temperature': 21.7}>
You can directly add the Datasets, but the temperature
values in Metadata are not included in the returned object
>>> dset3 = dset1 + dset2
>>> dset3
array([5., 7., 9.])
>>> dset3.metadata
Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'metadata'
You are responsible for keeping track of the Metadata in arithmetic operations, for example,
>>> temperatures = {'t1': dset1.metadata.temperature, 't2': dset2.metadata.temperature}
>>> dset3 = root.create_dataset('dset3', data=dset1+dset2, temperatures=temperatures)
>>> dset3
<Dataset '/dset3' shape=(3,) dtype='<f8' (1 metadata)>
>>> dset3.data
array([5., 7., 9.])
>>> dset3.metadata
<Metadata '/dset3' {'temperatures': {'t1': 20.3, 't2': 21.7}}>
A Dataset for Logging Records
The DatasetLogging class is a custom Dataset
that is also a Handler which automatically appends logging records
to the Dataset. See create_dataset_logging() for
more details.
When a file is read() it will load an object that was once a
DatasetLogging as a Dataset.
If you want to convert the Dataset to be a
DatasetLogging object, so that logging records are once
again appended to it, then call the require_dataset_logging() method
with the name argument equal to the value of name for the Dataset.