PyTables User's Guide
previousTable of Contentsnext
 

Chapter 3: Some tutorials

Tout le malheur des hommes vient d'une seule chose, qui est de ne savoir pas demeurer en repos, dans une chambre.
—Blaise Pascal

This chapter begins with a series of simple, yet comprehensive sections written in a tutorial style that will let you understand the main features that PyTables provide. If during the trip you want more information on some specific instance variable, global function or method, look at the doc strings or go to the library reference in chapter 4. However, if you are reading this in PDF or HTML formats, there should be an hyperlink to its reference near each newly introduced entity.

Please, note that throughout this document the terms column and field will be used interchangeably with the same meaning, and the same goes for the terms row and record.

3.1 Getting started

In this section, we will see how to define our own records from Python and save collections of them (i.e. a table) on a file. Then, we will select some data in the table using Python cuts, creating numarray arrays to keep this selection as separate objects in the tree.

In examples/tutorial1-1.py you will find the working version of all the code in this section. Nonetheless, this tutorial series has been written to allow you reproduce it in a Python interactive console. You are encouraged to take advantage of that by doing parallel testing and inspecting the created objects (variables, docs, children objects, etc.) during the voyage!.

3.1.1 Importing tables objects

Before doing anything you need to import the public objects in the tables package. You normally do that by issuing:

>>> import tables
>>>
	  

This is the recommended way to import tables if you don't want to pollute too much your namespace. However, PyTables has a very reduced set of first-level primitives, so you may consider to use this alternative:

>>> from tables import *
>>>
	  

that will export in your caller application namespace the next objects: openFile, isHDF5, isPyTablesFile and IsDescription. These are a rather small number of objects, and for convenience, we will use this last way to access them.

If you are going to deal with numarray or Numeric arrays (and normally, you will) you also need to import some objects from it. You can do that in the normal way. So, to access to PyTables functionality normally you should start you programs with:

>>> import tables        # but in this tutorial we use "from tables import *"
>>> from numarray import *  # or "from Numeric import *"
>>>
	  

3.1.2 Declaring a Column Descriptor

Now, imagine that we have a particle detector and we want to create a table object in order to save data that comes from it. You need first to define that table, how many columns it have, which kind of object is each element on the columns, and so on.

Our detector has a TDC (Time to Digital Converter) counter with a dynamic range of 8 bits and an ADC (Analogic to Digital Converter) with a range of 16 bits. For these values, we will define 2 fields in our record object called TDCcount and ADCcount. We also want to save the grid position in which the particle has been detected and we will add two new fields called grid_i and grid_j. Our instrumentation also can obtain the pressure and energy of this particle that we want to add in the same way. The resolution of pressure-gauge allows us to use simple-precision float which will be enough to save pressure information, while energy would need a double-precision float. Finally, to track this particle we want to assign it a name to inform about the kind of the particle and a number identifier unique for each particle. So we will add a couple of fields: name will be the a string of up-to 16 characters and because we want to deal with a really huge number of particles, idnumber will be an integer of 64 bits.

With all of that, we can declare a new Particle class that will keep all this info:

>>> class Particle(IsDescription):
...     name      = StringCol(16)   # 16-character String
...     idnumber  = Int64Col()      # Signed 64-bit integer
...     ADCcount  = UInt16Col()     # Unsigned short integer
...     TDCcount  = UInt8Col()      # unsigned byte
...     grid_i    = Int32Col()      # integer
...     grid_j    = IntCol()        # integer (equivalent to Int32Col)
...     pressure  = Float32Col()    # float  (single-precision)
...     energy    = FloatCol()      # double (double-precision)
...
>>>
	  

This definition class is quite auto-explanatory. Basically, you have to declare a class variable for each field you need, and as its value we assign a subclass instance of the Col class, that describes the kind of column (the data type, the length, the shape, ...). See section 4.3 for a complete description of these subclasses. See also appendix A for a list of data types supported in Col constructors.

From now on, we can use Particle instances as a descriptor for our detector data table. We will see how to pass this object to the Table constructor. But first, we must create a file where all the actual data pushed into Table will be saved.

3.1.3 Creating a PyTables file from scratch

To create a PyTables file use the first-level openFile (see ??) function:

>>> h5file = openFile("tutorial1.h5", mode = "w", title = "Test file")
	  

This openFile (see ??) is one of the objects imported by the "from tables import *", do you remember?. Here, we are telling that we want to create a new file called "tutorial1.h5" in "w"rite mode and with an descriptive title string ("Test file"). This function tries to open the file, and if successful, returns a File (see 4.4) instance which hosts the root of the object tree on its root attribute.

3.1.4 Creating a new group

Now, to better organize our data, we will create a group hanging from the root called detector. We will use this group to save our particle data there.

>>> group = h5file.createGroup("/", 'detector', 'Detector information')
>>>
	  

Here, we have taken the File instance h5file and invoked its createGroup method (see 4.4.2), telling that we want to create a new group called detector hanging from "/", which is other way to refer to the h5file.root object we mentioned before. This will create a new Group (see4.5) instance that will be assigned to the group variable.

3.1.5 Creating a new table

Let's now create the Table (see 4.7) object hanging from the new created group. We do that by calling the createTable (see ??) method from the h5file object:

>>> table = h5file.createTable(group, 'readout', Particle, "Readout example")
>>>
	  

Look at how we asked to create the Table instance hanging from group, with name "readout". We have passed Particle, the class that we have declared before, as the description parameter and finally we have used "Readout example" as a Table title. With all this information, a new Table instance is created and assigned to table variable.

If you are getting curious how the object tree looks like at this moment, simply print the name of the File instance, h5file, and look at their output:

>>> print h5file
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003'
/ (Group) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'

>>>
	  

As you can see, a dump of the object tree has been shown and it's very easy to visualize the Group and Table objects we have just created. If you want more information, just type the name of the File instance:

>>> h5file
>>> h5file
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003'
/ (Group) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'

>>> h5file
File(filename='tutorial1.h5', title='Test file', mode='w', trMap={}, rootUEP='/')
/ (Group) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'
  description := {
    "ADCcount": Col('UInt16', shape=1, itemsize=2, dflt=0),
    "TDCcount": Col('UInt8', shape=1, itemsize= 1, dflt=0),
    "energy": Col('Float64', shape=1, itemsize=8, dflt=0.0),
    "grid_i": Col('Int32', shape=1, itemsize=4, dflt=0),
    "grid_j": Col('Int32', shape=1, itemsize=4, dflt=0),
    "idnumber": Col('Int64', shape=1, itemsize=8, dflt=0),
    "name": Col('CharType', shape=1, itemsize=16, dflt=None),
    "pressure": Col('Float32', shape=1, itemsize=4, dflt=0.0) }
  byteorder := little

>>>
	  

where more detailed info is printed on each object on the tree. Pay attention on how Particle, our table descriptor class, is printed as part of the readout table description information. In general, you can obtain lot of information on the objects and its children by just printing them. That introspection capability is very meaningful, so I recommend you to use it extensively.

Now, time to fill this table with some values. But first, we are going to get a pointer to the Row instance of this table instance:

>>> particle = table.row
>>>
	  

The row attribute of table points to the Row (see 4.8) instance that will be used to input data rows into the table. We achieve this by just assigning it the values for each row as if it was a dictionary (although it is actually an extension class) and using the column names as keys.

Look at how the filling process works like:

>>> particle = table.row
>>> for i in xrange(10):
...     particle['name']  = 'Particle: %6d' % (i)
...     particle['TDCcount'] = i % 256
...     particle['ADCcount'] = (i * 256) % (1 << 16)
...     particle['grid_i'] = i
...     particle['grid_j'] = 10 - i
...     particle['pressure'] = float(i*i)
...     particle['energy'] = float(particle['pressure'] ** 4)
...     particle['idnumber'] = i * (2 ** 34)
...     particle.append()
...
>>>
	  

This code should be easy to understand. The lines inside the loop just assign values to the different columns in the particle Row instance (see 4.8) and then a call to its append() method is made to put this information in the table I/O buffer.

After we have pushed all our data, we should flush the I/O buffer for the table if we want to consolidate all this data on disk. We can achieve that by calling the table.flush() method.

>>> table.flush()
>>>
	  

3.1.6 Reading (and selecting) data in table

Ok. We have now our data on disk but to this data be useful we need to access it and select some values we are interested in and located at some specific columns. That's is easy to do:

>>> table = h5file.root.detector.readout
>>> pressure = [ x['pressure'] for x in table.iterrows()
...              if x['TDCcount']>3 and 20<=x['pressure']<50 ]
>>> pressure
[25.0, 36.0, 49.0]
>>>
	  

The first line is only to declare a convenient shortcut to the readout table which is a bit deeper on the object tree. As you can see, we have used the natural naming schema to access it. We could also have used the h5file.getNode() method instead, and we will certainly do that later on.

You will recognize the last two lines to be a Python list comprehension. It loops over rows in table as they are provided by table.iterrows() iterator (see ??) that returns values until data in table is exhausted. These rows are filtered using the expression x['TDCcount'] > 3 and x['pressure'] < 50, and the pressure field for satisfying records is selected to form the final list that is assigned to pressure variable.

We could indeed have used a normal for loop to do that, but I find comprehension syntax to be more compact and elegant.

Let's select the names for the same set of cuts:

>>> names=[ x['name'] for x in table if x['TDCcount']>3 and 20<=x['pressure']<50 ]
>>> names
['Particle:      5', 'Particle:      6', 'Particle:      7']
>>>
	  

Note how we have omitted the iterrows() call in the list comprehension. This is because the Table class has an implementation of the special method called __iter__(), so that it implements the iterator protocol over all the rows in the table. In fact, iterrows() internally calls this special __iter__() method. This way to access all the rows in a table turns out to be very convenient, specially for interactive use.

Ok. that's enough for selections. Next section will show you how to save these selections on file.

3.1.7 Creating new array objects

In order to separate the selected data from the detector data, we will create a new group, called columns hanging from the root group:

>>> gcolumns = h5file.createGroup(h5file.root, "columns", "Pressure and Name")
>>>
	  

Note that this time we have specified the first parameter in a natural naming fashion (h5file.root) instead of using an absolute path string ("/").

Now, create one Array object:

>>> h5file.createArray(gcolumns, 'pressure', array(pressure),
...                     "Pressure column selection")
/columns/pressure (Array(3,)) 'Pressure column selection'
  type = Float64
  itemsize = 8
  flavor = 'NumArray'
  byteorder = 'little'
>>>
	  

We already know the first two parameters of the createArray (see ??) methods (these are the same as the firsts in createTable): they are the parent group where Array will be created and the Array instance name. You can figure out that the fourth parameter is the title. And in the third position we have the object we want to save on disk. In this case, it is a Numeric array that is built from the selection lists we created before.

Now, we are going to save the other selection. In this case it's a list of strings, and we want to save this object as is, with no further conversion. Look at how this can be done:

>>> h5file.createArray(gcolumns, 'name', names, "Name column selection")
/columns/name Array(4,) 'Name column selection'
  type = 'CharType'
  itemsize = 16
  flavor = 'List'
  byteorder = 'little'
>>>
	  

You see, createArray() accepts names (which is a regular Python list) as object parameter. Actually, it accepts a variety of other regular objects (see ??). We will check that we can retrieve exactly the same object from disk later on.

Note that in this examples, createArray method returns an Array instance that is not assigned to any variable. Don't worry, this was intentional because I wanted to show you the kind of object we have created by showing its representation. Indeed, the Array objects has been attached to the object tree and saved on disk, as you can see if you print the complete object tree:

>>> print h5file
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003'
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'

>>>	  
	  

3.1.8 Closing the file and looking at its content

To finish this first tutorial, we use the close method of the h5file File instance to close the file before exiting Python:

>>> h5file.close()
>>> ^D
	  

With all that, you have created your first PyTables file with a table and two arrays. That was easy, admit it. Now, you can have a look at it with some generic HDF5 tool, like h5dump or h5ls. Here is the result of passing to h5ls the tutorial1.h5 file:

$ h5ls -rd tutorial1.h5
/columns                 Group
/columns/name            Dataset {3}
    Data:
        (0) "Particle:      5", "Particle:      6", "Particle:      7"
/columns/pressure        Dataset {3}
    Data:
        (0) 25, 36, 49
/detector                Group
/detector/readout        Dataset {10/Inf}
    Data:
        (0) {0, 0, 0, 0, 10, 0, "Particle:      0", 0},
        (1) {256, 1, 1, 1, 9, 17179869184, "Particle:      1", 1},
        (2) {512, 2, 256, 2, 8, 34359738368, "Particle:      2", 4},
        (3) {768, 3, 6561, 3, 7, 51539607552, "Particle:      3", 9},
        (4) {1024, 4, 65536, 4, 6, 68719476736, "Particle:      4", 16},
        (5) {1280, 5, 390625, 5, 5, 85899345920, "Particle:      5", 25},
        (6) {1536, 6, 1679616, 6, 4, 103079215104, "Particle:      6", 36},
        (7) {1792, 7, 5764801, 7, 3, 120259084288, "Particle:      7", 49},
        (8) {2048, 8, 16777216, 8, 2, 137438953472, "Particle:      8", 64},
        (9) {2304, 9, 43046721, 9, 1, 154618822656, "Particle:      9", 81}
	  

or, using the "dumpFile.py" PyTables utility (located in examples/ directory):

$ python dumpFile.py tutorial1.h5
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:40:51 2003'
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'

	  

You can pass the -v or -d options to dumpFile.py if you want more verbosity. Try them out!.

3.2 Browsing the object tree and more

In this section, we will learn how to browse the tree while retrieving meta-information about the actual data, and will finish by appending some rows to the existing table to show how table objects can be enlarged.

In examples/tutorial1-2.py you will find the working version of all the code in this section. As before, you are encouraged to use a python shell and inspect the object tree during the voyage.

3.2.1 Traversing the object tree

First of all, let's open the file we have recently created in last tutorial section, as we will take it as a basis for this section:

>>> h5file = openFile("tutorial1.h5", "a")
	  

This time, we have opened the file in "a"ppend mode. We are using this mode because we want to add more information to the file.

PyTables, following the Python tradition, offers powerful introspection capabilities, i.e. you can easily ask information about any component of the object tree as well as traverse the tree searching for something.

To start with, you can get a first glance image of the object tree, by simply printing the existing File instance:

>>> print h5file
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:40:51 2003'
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'

>>>
	  

That's right, it seems that all our objects are there. Now, let's make use of the File iterator to see how to list all the nodes in the object tree:

>>> for node in h5file:
...   print node
...
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/detector (Group) 'Detector information'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector/readout (Table(10,)) 'Readout example'
>>>
	  

We can use the walkGroups method (see ??) of File class to list only the groups on tree:

>>> for group in h5file.walkGroups("/"):
...   print group
...
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/detector (Group) 'Detector information'
>>>
	  

Note that walkGroups() actually returns an iterator, not a list of objects. Combining this iterator with the listNodes() method, we can do very powerful things. Let's see an example listing all the arrays in the tree:

>>> for group in h5file.walkGroups("/"):
...     for array in h5file.listNodes(group, classname = 'Array'):
...         print array
...
/columns/name Array(4,) 'Name column selection'
/columns/pressure Array(4,) 'Pressure column selection'
	  

listNodes() (see ??) returns a list containing all the nodes hanging from a specific Group, and if classname keyword is specified, the method will filter all instances which are not descendants of it. We have specified it to solely return Array instances.

We can combine both calls by using the __call__(where, classname) special method of File (see ??), i.e.:

>>> for array in h5file("/", "Array"):
...   print array
...
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
>>>
	  

which is a nice shortcut for doing interactive work.

As a final example, we will list all the Leaf, i.e. Table and Array instances (see 4.6 for detailed information on Leaf class), in /detector group. Check that only one instance of Table class (i.e. readout) will be selected in this group (as it should be):

>>> for leaf in h5file.root.detector('Leaf'):
...   print leaf
...
/detector/readout (Table(10,)) 'Readout example'
>>> 
	  

where we have used a call to the Group.__call__(classname, recursive) special method (??), combined with a natural naming path specification.

Of course you can do more sophisticated node selections using these powerful methods, but first, we need to learn a bit about some important instance variables of PyTables objects.

3.2.2 Setting and getting user attributes

PyTables provides an easy and concise way to complement the meaning of your node objects on the tree by using the AttributeSet class (see section 4.10). You can access to this object through the standard attribute attrs in Leaf nodes and _v_attrs in Group nodes.

For example, let's imagine that we want to save the date indicating when the data in /detector/readout table has been acquired, as well as the temperature during the gathering process. That is easy:

>>> table = h5file.root.detector.readout
>>> table.attrs.gath_date = "Wed, 06/12/2003 18:33"
>>> table.attrs.temperature = 18.4
>>> table.attrs.temp_scale = "Celsius"
>>>
	  

Now, set a somewhat more complex attribute in the /detector group:

>>> detector = h5file.root.detector
>>> detector._v_attrs.stuff = [5, (2.3, 4.5), "Integer and tuple"]
>>>
	  

Note how the AttributeSet instance is accessed with _v_attrs because detector is a Group node. In general, you can save any standard Python data structure as an attribute node, but see section 4.10 for a more detailed explanation of how this are serialized on disk.

Now, getting the attributes is equally easy:

>>> table.attrs.gath_date
'Wed, 06/12/2003 18:33'
>>> table.attrs.temperature
18.399999999999999
>>> table.attrs.temp_scale
'Celsius'
>>> detector._v_attrs.stuff
[5, (2.2999999999999998, 4.5), 'Integer and tuple']
>>>
	  

You can probably guess how to delete attributes:

>>> del table.attrs.gath_date
	  

If you want to have a look at the current attribute set of /detector/table, you can print its representation (try also hitting the TAB key twice if you are on a Unix Python console with the rlcompleter module active):

>>> table.attrs
/detector/readout (AttributeSet), 14 attributes:
   [CLASS := 'TABLE',
    FIELD_0_NAME := 'ADCcount',
    FIELD_1_NAME := 'TDCcount',
    FIELD_2_NAME := 'energy',
    FIELD_3_NAME := 'grid_i',
    FIELD_4_NAME := 'grid_j',
    FIELD_5_NAME := 'idnumber',
    FIELD_6_NAME := 'name',
    FIELD_7_NAME := 'pressure',
    NROWS := 10,
    TITLE := 'Readout example',
    VERSION := '2.0',
    tempScale := 'Celsius',
    temperature := 18.399999999999999]
>>>
	  

You can get a list only the user or system attributes with the _v_list() method.

>>> print table.attrs._f_list("user")
['temp_scale', 'temperature']
>>> print table.attrs._f_list("sys")
['CLASS', 'FIELD_0_NAME', 'FIELD_1_NAME', 'FIELD_2_NAME', 'FIELD_3_NAME',
 'FIELD_4_NAME', 'FIELD_5_NAME', 'FIELD_6_NAME', 'FIELD_7_NAME', 'NROWS',
 'TITLE', 'VERSION']
>>>
	  

And rename attributes:

>>> table.attrs._f_rename("temp_scale","tempScale")
>>> print table.attrs._f_list()
['tempScale', 'temperature']
>>>
	  

However, you can't set, delete or rename read-only attributes:

>>> table.attrs._f_rename("VERSION", "version")
Traceback (most recent call last):
  File ">stdin>", line 1, in ?
  File "/home/falted/PyTables/pytables-0.7/tables/AttributeSet.py", line 249, in _f_rename
    raise RuntimeError, \
RuntimeError: Read-only attribute ('VERSION') cannot be renamed
>>>
	  

After your session, you can check that the /detector/readout attributes in disk looks like:

$ h5ls -vr tutorial1.h5/detector/readout
Opened "tutorial1.h5" with sec2 driver.
/detector/readout        Dataset {10/Inf}
    Attribute: CLASS     scalar
        Type:      6-byte null-terminated ASCII string
        Data:  "TABLE"
    Attribute: VERSION   scalar
        Type:      4-byte null-terminated ASCII string
        Data:  "2.0"
    Attribute: TITLE     scalar
        Type:      16-byte null-terminated ASCII string
        Data:  "Readout example"
    Attribute: FIELD_0_NAME scalar
        Type:      9-byte null-terminated ASCII string
        Data:  "ADCcount"
    Attribute: FIELD_1_NAME scalar
        Type:      9-byte null-terminated ASCII string
        Data:  "TDCcount"
    Attribute: FIELD_2_NAME scalar
        Type:      7-byte null-terminated ASCII string
        Data:  "energy"
    Attribute: FIELD_3_NAME scalar
        Type:      7-byte null-terminated ASCII string
        Data:  "grid_i"
    Attribute: FIELD_4_NAME scalar
        Type:      7-byte null-terminated ASCII string
        Data:  "grid_j"
    Attribute: FIELD_5_NAME scalar
        Type:      9-byte null-terminated ASCII string
        Data:  "idnumber"
    Attribute: FIELD_6_NAME scalar
        Type:      5-byte null-terminated ASCII string
        Data:  "name"
    Attribute: FIELD_7_NAME scalar
        Type:      9-byte null-terminated ASCII string
        Data:  "pressure"
    Attribute: tempScale scalar
        Type:      8-byte null-terminated ASCII string
        Data:  "Celsius"
    Attribute: temperature {1}
        Type:      native double
        Data:  18.4
    Attribute: NROWS     {1}
        Type:      native int
        Data:  10
    Location:  0:1:0:1952
    Links:     1
    Modified:  2003-07-24 13:59:19 CEST
    Chunks:    {2048} 96256 bytes
    Storage:   470 logical bytes, 96256 allocated bytes, 0.49% utilization
    Type:      struct {
                   "ADCcount"         +0    native unsigned short
                   "TDCcount"         +2    native unsigned char
                   "energy"           +3    native double
                   "grid_i"           +11   native int
                   "grid_j"           +15   native int
                   "idnumber"         +19   native long long
                   "name"             +27   16-byte null-terminated ASCII string
                   "pressure"         +43   native float
               } 47 bytes
 

	  

As you can see, the use of attributes can be a good mechanism to add persistent (meta) information to your actual data. Be sure to use them extensively.

3.2.3 Getting object metadata

Each object in PyTables has metadata information about the actual data on the file. Normally this metainformation is accessible through the node instance variables. Let's have a look at some examples:

>>> print "Object:", table
Object: /detector/readout Table(10,) 'Readout example'
>>> print "Table name:", table.name
Table name: readout
>>> print "Table title:", table.title
Table title: Readout example
>>> print "Number of rows in table:", table.nrows
Number of rows in table: 10
>>> print "Table variable names with their type and shape:"
Table variable names with their type and shape:
>>> for name in table.colnames:
...   print name, ':= %s, %s' % (table.coltypes[name], table.colshapes[name])
...
ADCcount := UInt16, 1
TDCcount := UInt8, 1
energy := Float64, 1
grid_i := Int32, 1
grid_j := Int32, 1
idnumber := Int64, 1
name := CharType, 1
pressure := Float32, 1
>>>
	  

Here, the name, title, nrows, colnames, coltypes and colshapes attributes (see 4.4.1 for a complete attribute list) of Table object give us quite a lot of information about actual table data.

In general, you can get up-to-the-minute information about the public objects in PyTables in a interactive way by printing its internal doc strings:

>>> print table.__doc__
Represent a table in the object tree.

    It provides methods to create new tables or open existing ones, as
    well as to write/read data to/from table objects over the
    file. A method is also provided to iterate over the rows without
    loading the entire table or column in memory.

    Data can be written or read both as Row() instances or as numarray
    (NumArray or RecArray) objects.
    
    Methods:
    
      Common to all leaves:
        close()
        flush()
        getAttr(attrname)
        rename(newname)
        remove()
        setAttr(attrname, attrvalue)
        
      Specific of Table:
        iterrows()
        read([start] [, stop] [, step] [, field [, flavor]])
        removeRows(start, stop)

    Instance variables:
    
      Common to all leaves:
        name -- the leaf node name
        hdf5name -- the HDF5 leaf node name
        title -- the leaf title
        shape -- the leaf shape
        byteorder -- the byteorder of the leaf
        
      Specific of Table:
        description -- the metaobject describing this table
        row -- a reference to the Row object associated with this table
        nrows -- the number of rows in this table
        rowsize -- the size, in bytes, of each row
        colnames -- the field names for the table (list)
        coltypes -- the type class for the table fields (dictionary)
        colshapes -- the shapes for the table fields (dictionary)

>>>
	  

This is very handy if you don't have this manual at hand. Try yourself with other objects docs, like for example:

>>> help(table.__class__)
>>> help(table.removeRows)
	  

Now, print some metadata in /columns/pressure Array object:

>>> pressureObject = h5file.getNode("/columns", "pressure")
>>> print "Info on the object:", repr(pressureObject)
Info on the object: /columns/pressure (Array(3,)) 'Pressure column selection'
  type = Float64
  itemsize = 8
  flavor = 'NumArray'
  byteorder = 'little'
>>> print "  shape: ==>", pressureObject.shape
  shape: ==> (3,)
>>> print "  title: ==>", pressureObject.title
  title: ==> Pressure column selection
>>> print "  type: ==>", pressureObject.type
  type: ==> Float64
>>>
	  

Observe how we have used the getNode() method of File class to access a node in the tree, instead of the natural naming method. Both are useful, and depending on the context you will prefer to use one or another. getNode() has the advantage that it can get a node from the pathname string (like in this example), and, besides, you can force a filter so that the node in that location has to be a classname instance. However, I consider natural naming to be more elegant and quicker to specify, specially if you are using the name completion capability present in interactive console. I suggest to give a try at this powerful combination of natural naming and completion capabilities present on most Python consoles. You will see how pleasant can be browsing the object tree (well, as long as this activity can be qualified in that way).

If you look at the type attribute of the pressureObject, you can certify that this is a "Float64" array, and that by looking at their shape attribute, it can deduced that the array on disk is unidimensional and has 4 elements. See 4.9.1 or the internal string docs for the complete Array attribute list.

3.2.4 Reading actual data from Array objects

Once you have found the desired Array and decided that you want to retrieve the actual data array from it, you should use the read() method of the Array object:

>>> pressureArray = pressureObject.read()
>>> pressureArray
array([ 25.,  36.,  49.])
>>> print "pressureArray is an object of type:", type(pressureArray)
pressureArray is an object of type: <class 'numarray.numarraycore.NumArray'>
>>> nameArray = h5file.root.columns.name.read()
>>> nameArray
['Particle:      5', 'Particle:      6', 'Particle:      7']
>>> print "nameArray is an object of type:", type(nameArray)
nameArray is an object of type: <type 'list'>
>>>
>>> print "Data on arrays nameArray and pressureArray:"
Data on arrays nameArray and pressureArray:
>>> for i in range(pressureObject.shape[0]):
...   print nameArray[i], "-->", pressureArray[i]
...
Particle:      5 --> 25.0
Particle:      6 --> 36.0
Particle:      7 --> 49.0
>>> pressureObject.name
'pressure'
>>> 
	  

You can verify as the read() method (see section ??) returns an authentic numarray object for the pressureObject instance by looking at the output of the type() call, while for the nameObject instance read() returns a native Python list (of strings). This is because the type of the object saved is kept as an HDF5 attribute (named FLAVOR) for these objects on disk. This attribute is then read as part of the Array metainformation and accessible through the Array.attrs.FLAVOR variable, enabling the read array to be converted into the original object. This provides a means to save a large variety of objects as arrays with the guarantee that you will be able to recover them in its original form afterwards. See section ?? for a complete list of supported objects for Array.

3.2.5 Appending data to an existing table

Now, let's have a look at how we can add records to an existing on-disk table. Let's use our well-known readout Table instance and let's append some new values to it:

>>> table = h5file.root.detector.readout
>>> particle = table.row
>>> for i in xrange(10, 15):
...     particle['name']  = 'Particle: %6d' % (i)
...     particle['TDCcount'] = i % 256
...     particle['ADCcount'] = (i * 256) % (1 << 16)
...     particle['grid_i'] = i
...     particle['grid_j'] = 10 - i
...     particle['pressure'] = float(i*i)
...     particle['energy'] = float(particle['pressure'] ** 4)
...     particle['idnumber'] = i * (2 ** 34)
...     particle.append()
...
>>> table.flush()
>>>
	  

That works exactly in the same way than filling a new table. PyTables knows that this table is on disk, and when you add new records, they are appended to the end of the table3).

If you look carefully at the code you will see that we have used the table.row attribute so as to access a table row and fill it up with the new values. Each time that its append() method is called, the actual row is committed to the output buffer and the row pointer is incremented to point to the next table record. When the buffer is full, the data is saved on disk, and the buffer is reused again for the next cycle.

Caveat emptor!: Do not forget to always call the .flush() method after a writing operation; else your tables will not be fully updated!.

Let's have a look at some columns of the resulting table:

>>> for r in table.iterrows():
...     print "%-16s | %11.1f | %11.4g | %6d | %6d | %8d |" % \
...        (r['name'], r['pressure'], r['energy'], r['grid_i'], r['grid_j'],
...         r['TDCcount'])
...
...
Particle:      0 |         0.0 |           0 |      0 |     10 |        0 |
Particle:      1 |         1.0 |           1 |      1 |      9 |        1 |
Particle:      2 |         4.0 |         256 |      2 |      8 |        2 |
Particle:      3 |         9.0 |        6561 |      3 |      7 |        3 |
Particle:      4 |        16.0 |   6.554e+04 |      4 |      6 |        4 |
Particle:      5 |        25.0 |   3.906e+05 |      5 |      5 |        5 |
Particle:      6 |        36.0 |    1.68e+06 |      6 |      4 |        6 |
Particle:      7 |        49.0 |   5.765e+06 |      7 |      3 |        7 |
Particle:      8 |        64.0 |   1.678e+07 |      8 |      2 |        8 |
Particle:      9 |        81.0 |   4.305e+07 |      9 |      1 |        9 |
Particle:     10 |       100.0 |       1e+08 |     10 |      0 |       10 |
Particle:     11 |       121.0 |   2.144e+08 |     11 |     -1 |       11 |
Particle:     12 |       144.0 |     4.3e+08 |     12 |     -2 |       12 |
Particle:     13 |       169.0 |   8.157e+08 |     13 |     -3 |       13 |
Particle:     14 |       196.0 |   1.476e+09 |     14 |     -4 |       14 |
	  

3.2.6 And finally... how to remove rows from a table

Let's starting finishing this tutorial by deleting some rows from the table we have. Suppose that we want to delete the rows from 5th to 9th (inclusive). That's very easy to do:

>>> table.removeRows(5,10)
5
>>>
	  

removeRows(start, stop) (see ??) deletes the rows in the range (start, stop). It returns the number of rows effectively removed.

We have reached the end of this first tutorial. But, ei!, do not forget to close the file after you finish all the work:

>>> h5file.close()
>>> ^D
$ 
	  

In figure 3.1 you can see a graphical view of the PyTables file, with the datasets we have just created. And in figure 3.2 you can see the general properties of the table /detector/readout.

The final version of data file for tutorial... (Click for original bitmap)
Figure 3.1: The final version of data file for tutorial 1, with a view of the data objects.
General properties of the /detector/readout... (Click for original bitmap)
Figure 3.2: General properties of the /detector/readout table.

3.3 Multidimensional table cells and automatic sanity checks

Now, time for a more real life example (i.e. with errors in code). Here, we will create a couple of groups hanging directly from root called Particles and Events. Then, we will put 3 tables in each group; in Particles we will put tables based on Particle descriptor and in Events, tables based Event descriptor.

After that, we will feed the tables with a number of records. Finally, we will read the recently created table /Events/TEvent3 and select some values from it using a comprehension list.

Look at the next script (you can find it in examples/tutorial2.py). It seems to do all of that, but a couple of small bugs will be shown up. Note that this Particle class is not directly related with the one defined in last example; this one is simpler (but notice the multidimensional columns called pressure and temperature!). And we will introduce a new manner to describe a Table as a dictionary, as you can see in the Event description. See section ?? about the different kinds of descriptor objects that can be passed to the createTable() method.

from numarray import *
from tables import *

# Describe a particle record
class Particle(IsDescription):
    name        = StringCol(length=16) # 16-character String
    lati        = IntCol()             # integer
    longi       = IntCol()             # integer
    pressure    = Float32Col(shape=(2,3)) # array of floats (single-precision)
    temperature = FloatCol(shape=(2,3))   # array of doubles (double-precision)

# Another way to describe the columns of a table
Event = {
    "name"    : Col('CharType', 16),    # 16-character String
    "TDCcount": Col("UInt8", 1),        # unsigned byte
    "ADCcount": Col("UInt16", 1),       # Unsigned short integer
    "xcoord"  : Col("Float32", 1),      # integer
    "ycoord"  : Col("Float32", 1),      # integer
    }

# Open a file in "w"rite mode
fileh = openFile("tutorial2.h5", mode = "w")
# Get the HDF5 root group
root = fileh.root
# Create the groups:
for groupname in ("Particles", "Events"):
    group = fileh.createGroup(root, groupname)
# Now, create and fill the tables in Particles group
gparticles = root.Particles
# Create 3 new tables
for tablename in ("TParticle1", "TParticle2", "TParticle3"):
    # Create a table
    table = fileh.createTable("/Particles", tablename, Particle,
                           "Particles: "+tablename)
    # Get the record object associated with the table:
    particle = table.row
    # Fill the table with 257 particles
    for i in xrange(257):
        # First, assign the values to the Particle record
        particle['name'] = 'Particle: %6d' % (i)
        particle['lati'] = i 
        particle['longi'] = 10 - i
        ########### Detectable errors start here. Play with them!
        particle['pressure'] = array(i*arange(2*3), shape=(2,4))  # Incorrect
        #particle['pressure'] = array(i*arange(2*3), shape=(2,3))  # Correct
        ########### End of errors
        particle['temperature'] = (i**2)     # Broadcasting
        # This injects the Record values
        particle.append()      
    # Flush the table buffers
    table.flush()

# Now, go for Events:
for tablename in ("TEvent1", "TEvent2", "TEvent3"):
    # Create a table in Events group
    table = fileh.createTable(root.Events, tablename, Event,
                           "Events: "+tablename)
    # Get the record object associated with the table:
    event = table.row
    # Fill the table with 257 events
    for i in xrange(257):
        # First, assign the values to the Event record
        event['name']  = 'Event: %6d' % (i)
        event['TDCcount'] = i % (1<<8)   # Correct range
        ########### Detectable errors start here. Play with them!
        #event['xcoord'] = float(i**2)   # Correct spelling
        event['xcoor'] = float(i**2)     # Wrong spelling
        event['ADCcount'] = i * 2        # Correct type
        #event['ADCcount'] = "s"          # Wrong type
        ########### End of errors
        event['ycoord'] = float(i)**4
        # This injects the Record values
        event.append()

    # Flush the buffers
    table.flush()

# Read the records from table "/Events/TEvent3" and select some
table = root.Events.TEvent3
e = [ p['TDCcount'] for p in table
      if p['ADCcount'] < 20 and 4 <= p['TDCcount'] < 15 ]
print "Last record ==>", p
print "Selected values ==>", e
print "Total selected records ==> ", len(e)
# Finally, close the file (this also will flush all the remaining buffers!)
fileh.close()
	

3.3.1 Shape checking

If you have read the code carefully it looks pretty good, but it won't work. When you run this example, you will get the next error:

$ python tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 53, in ?
    particle['pressure'] = array(i*arange(2*3), shape=(2,4))  # Incorrect
  File "/usr/local/lib/python2.2/site-packages/numarray/numarraycore.py", line 281, in array
    a.setshape(shape)
  File "/usr/local/lib/python2.2/site-packages/numarray/generic.py", line 530, in setshape
    raise ValueError("New shape is not consistent with the old shape")
ValueError: New shape is not consistent with the old shape
	

which is saying that you are trying to assign an array of incompatible shape to a table cell. If you look at the source, we were trying to assign an array of shape (2,4) to a pressure element, which was defined to have a shape of (2,3).

In general, this kind of operations are forbidden, with a honorable exception: when you tries to assign an scalar value to a column cell that is multidimensional, all the cell elements are populated with the value of this scalar. This happens in the next line:

        particle['temperature'] = (i**2)    # Broadcasting
	  

So, the value i**2 is assigned to all the elements of the temperature table cell. This capability is provided by the numarray package and is known as broadcasting.

3.3.2 Field name checking

After fixing the previous error, and re-running again the program, we will get another one:

$ python tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 74, in ?
    event['xcoor'] = float(i**2)     # Wrong spelling
  File "/home/falted/PyTables/pytables-0.7/src/hdf5Extension.pyx",
 line 1812, in hdf5Extension.Row.__setitem__
    raise AttributeError, "Error setting \"%s\" attr.\n %s" % \
AttributeError: Error setting "xcoor" attr.
 Error was: "exceptions.KeyError: xcoor"
	  

This error is telling us that we tried to assign a value to a non-existent field in the event table object. By looking carefully at the Event class attributes, we see that we misspelled the xcoord field (we wrote xcoor instead). This is very unusual in Python because if you try to assign a value to a non-existent instance variable, a new one is created with that name. Such a feature is not satisfactory when we are dealing with an object that has fixed list of field names. So, a check is made inside PyTables so that if you try to assign a value to a non-existing field a KeyError is raised.

3.3.3 Data type checking

Finally, in order to test the type checking, we will change the next line:

	    event.ADCcount = i * 2        # Correct type
	  

to read:

	    event.ADCcount = "s"          # Wrong type
	  

After this modification, the next exception will be raised when the script is executed:

$ python tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 76, in ?
    event['ADCcount'] = "s"          # Wrong type
  File "/home/falted/PyTables/pytables-0.7/src/hdf5Extension.pyx", line 1812, in hdf5Extension.Row.__setitem__
    raise AttributeError, "Error setting \"%s\" attr.\n %s" % \
AttributeError: Error setting "ADCcount" attr.
 Error was: "exceptions.TypeError: NA_setFromPythonScalar: bad value type."
	  

that states the kind of error (TypeError).

You can admire the structure we have created with this (corrected) script in figure 3.3. In particular, pay attention to the multidimensional column cells in table /Particles/TParticle2.

Table hierarchy for tutorial 2. (Click for original bitmap)
Figure 3.3: Table hierarchy for tutorial 2.

Feel free to visit the rest of examples in directory examples, and try to understand them. I've tried to make several use cases to give you an idea of the PyTables capabilities and its way of dealing with HDF5 objects.


3) Note that you can append not only scalar values to tables, but also fully multidimensional array objects.

previousTable of Contentsnext