Dataset
A Dataset
is analogous to a file for an operating system and it
is contained within a Group.
A Dataset
is essentially a numpy.ndarray
with Metadata
and it can be accessed in read-only mode.
Since a Dataset
can be thought of as an numpy.ndarray
the attributes of
an numpy.ndarray
are also valid for a Dataset
. For example, suppose
my_dataset is a Dataset
>>> my_dataset
<Dataset '/my_dataset' shape=(5,) dtype='|V16' (2 metadata)>
>>> my_dataset.data
array([(0.23, 1.27), (1.86, 2.74), (3.44, 2.91), (5.91, 1.83), (8.73, 0.74)],
dtype=[('x', '<f8'), ('y', '<f8')])
You can get the numpy.ndarray.shape
using
>>> my_dataset.shape
(5,)
or convert the data in the Dataset
to a Python list
,
using numpy.ndarray.tolist()
>>> my_dataset.tolist()
[(0.23, 1.27), (1.86, 2.74), (3.44, 2.91), (5.91, 1.83), (8.73, 0.74)]
To access the Metadata
of a Dataset
,
you call the metadata
attribute
>>> my_dataset.metadata
<Metadata '/my_dataset' {'temperature': 20.13, 'humidity': 45.31}>
You can access values of the Metadata as attributes
>>> my_dataset.metadata.temperature
20.13
or as keys
>>> my_dataset.metadata['humidity']
45.31
Depending on the numpy.dtype
that was used to create the underlying
numpy.ndarray
for the Dataset
the field names
can also be accessed as field attributes. For example, you can access the fields
in my_dataset as keys
>>> my_dataset['x']
array([0.23, 1.86, 3.44, 5.91, 8.73])
or as attributes
>>> my_dataset.x
array([0.23, 1.86, 3.44, 5.91, 8.73])
Note that the returned object is a numpy.ndarray
and therefore does not
contain any Metadata
.
See Accessing Keys as Class Attributes for more information.
You can also chain multiple attribute calls together. For example, to get the maximum x value in my_dataset you can use
>>> my_dataset.x.max()
8.73
Slicing and Indexing
Slicing and indexing a Dataset
is a valid
operation, but returns a numpy.ndarray
which does not contain
any Metadata.
Consider my_dataset from above. One can slice it
>>> my_dataset[::2]
array([(0.23, 1.27), (3.44, 2.91), (8.73, 0.74)],
dtype=[('x', '<f8'), ('y', '<f8')])
or index it
>>> my_dataset[2]
(3.44, 2.91)
Since a numpy.ndarray
is returned, you are responsible for keeping
track of the Metadata in slicing and indexing operations.
For example,
>>> my_subset = root.create_dataset('my_subset', data=my_dataset[::2], **my_dataset.metadata)
>>> my_subset
<Dataset '/my_subset' shape=(3,) dtype='|V16' (2 metadata)>
>>> my_subset.data
array([(0.23, 1.27), (3.44, 2.91), (8.73, 0.74)],
dtype=[('x', '<f8'), ('y', '<f8')])
>>> my_subset.metadata
<Metadata '/my_subset' {'temperature': 20.13, 'humidity': 45.31}>
Arithmetic Operations
Arithmetic operations are valid with a Dataset
, however,
the returned object will be a numpy.ndarray
and therefore all
Metadata
of the Dataset
s
that are involved in the operation are not included in the returned object.
For example, suppose you have two Dataset
s that
contain the following information
>>> dset1
<Dataset '/dset1' shape=(3,) dtype='<f8' (1 metadata)>
>>> dset1.data
array([1., 2., 3.])
>>> dset1.metadata
<Metadata '/dset1' {'temperature': 20.3}>
>>> dset2
<Dataset '/dset2' shape=(3,) dtype='<f8' (1 metadata)>
>>> dset2.data
array([4., 5., 6.])
>>> dset2.metadata
<Metadata '/dset2' {'temperature': 21.7}>
You can directly add the Dataset
s, but the temperature
values in Metadata
are not included in the returned object
>>> dset3 = dset1 + dset2
>>> dset3
array([5., 7., 9.])
>>> dset3.metadata
Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'metadata'
You are responsible for keeping track of the Metadata in arithmetic operations, for example,
>>> temperatures = {'t1': dset1.metadata.temperature, 't2': dset2.metadata.temperature}
>>> dset3 = root.create_dataset('dset3', data=dset1+dset2, temperatures=temperatures)
>>> dset3
<Dataset '/dset3' shape=(3,) dtype='<f8' (1 metadata)>
>>> dset3.data
array([5., 7., 9.])
>>> dset3.metadata
<Metadata '/dset3' {'temperatures': {'t1': 20.3, 't2': 21.7}}>
A Dataset for Logging Records
The DatasetLogging
class is a custom Dataset
that is also a Handler
which automatically appends logging
records
to the Dataset
. See create_dataset_logging()
for
more details.
When a file is read()
it will load an object that was once a
DatasetLogging
as a Dataset
.
If you want to convert the Dataset
to be a
DatasetLogging
object, so that logging
records are once
again appended to it, then call the require_dataset_logging()
method
with the name argument equal to the value of name for the Dataset
.