3.2. Data handling

Handling time series data (and information in general) has become a amazingly complex problem over time. How can you store your data in a safe and protected location, leaving the possibility to share this information with the colleges in the same room, in the same building, or on the same planet? not to mention compatibility problems between different computer platforms.

Hopefully, a bunch of physicist in Geneva were asking the same question, and came to imagine and create the web, on top of Internet. That was a decade ago. Since then we learned a lot from this simple cross-platform experiment, and so appeared the World Wide Web Consortium [49] (W3C), defining recommendations (understand: standards) to enhance cross-platform exchanges of information.

[Note]YaTiSeWoBe and XML

The recommendation of the W3C for information exchange (and other things) is XML [56]: the eXtensible Markup Language. For YaTiSeWoBe, we chose to apply XML everywhere it suited, even for usages that are not apparent for the end user.

There we are: we have techniques, technologies and standards to exchange data over a network of computers. The distance between the two end points of the transmission are not really relevant anymore (same office, same building, …).

The idea behind YaTiSeWoBe is that you keep your data in repositories in some format, and import this data in the application where it can be manipulated transparently from its origin and format. The only information required for this to happen are the location on the network (given by a URI/URL [53]), called source, and an information on the format of the file (given by a MIME-type [38]).

Figure 3.1. Data handling overview

Data handing overview

Visual description of the data handling paradigm for YaTiSeWoBe.

repository

a place where you can store and access data. Local hard-drives (file://…), web servers (http://…, https://…), FTP servers (ftp://…), databases (odbc:, …), … all are valid repositories.

import

the action of getting the data from the source, parsing the original format, and leaving the result for user to manipulate in YaTiSeWoBe.

URI/URL

a string uniquely identifying (URI) or uniquely identifying and locating (URL) resources. The principal difference is that a URI doesn't tell you where to find something, and only tells you how to tell the difference between two resources. Read Section 3.2.1, “Resolving URIs/URLs with XML catalogs” to learn how XML catalogs can be used to determine the actual location of a resources given by its URI.

MIME-type

a string description of the format of a file. Examples of MIME-types for known formats are: text/plain for ASCII files, text/html for HTML documents, image/jpeg for JPEG raster images, …

A few examples of the advantages of this paradigm:

but there are a few problems, too:

The bet is: how do we make this transparent for the end-user?

In order to allow a structured handling of data, several levels of data organization have been defined: data sets, data bundles and data collections

3.2.1. Resolving URIs/URLs with XML catalogs

OASIS (Organization for the Advancement of Structured Information Standards) [41] entity resolution Technical Committee [26] developed ... As part of the Apache Foundation XML project [14], the Open Source community implemented this ... Apache XML-commons resolver [15].

Using XML catalogs, locations can be expressed as URIs/URLs resolved by YaTiSeWoBe at import time.

Imagine you have your data available on the web server of your institution. You configured YaTiSeWoBe to fetch the data file you are working on in the proper repository: http://data.example.org/2004/02/04/230923.data. You go off for a week-end, in the mountain and don't expect to have Internet access, but still want to work on that file. You can then use the main catalog of YaTiSeWoBe to say: "for now on, all URI starting with http://data.example.org/2004/02/04 are reformulated to file:///data/localcopy. You can make a copy of your file on the local disk, and leave all other configurations unchanged!

An example of an XML catalog can be read in Example 5.4, “Native document: catalog section”.

3.2.2. Data filter

[Important]Future feature

As of version 0.1.2, functionalities described in this section have not been fully implemented yet, but are already planned for future releases of YaTiSeWoBe. See Section 6.8.2, “Forseable releases” for more details.

[Warning]FIXME

this doesn't stand here

include exclude

Example 3.1. Data filter

...

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<data:filter 
    xmlns:data="http://www.nhrg.org/schema/data"
    xmlns:filter="http://www.nhrg.org/schema/filter">

      <filter:include>
         <data:selector>
            <data:type code="0x1" qualifier="0x0" mask="0x0" />
         </data:selector>
      </filter:include>

      <filter:exclude>
         <data:selector>
            <data:type code="0x1" qualifier="0x1" mask="0x1" />
         </data:selector>

         <data:selector>
            <data:source 
               url="http://www.nhrg.org/selector/default.xml"
               mime="application/x-nhrg-xml-selector" />
         </data:selector>
      </filter:exclude>

</data:filter> 

                    

3.2.3. Data set

[Important]Future feature

As of version 0.1.2, functionalities described in this section have not been fully implemented yet, but are already planned for future releases of YaTiSeWoBe. See Section 6.8.2, “Forseable releases” for more details.

The notion of set of files allows the user to define once for all properties shared by a group of files (like labels or styles) and to apply them on a new file simply by adding that file to the set.

Example 3.2. Data set

...

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<data:set xmlns:data="http://www.nhrg.org/schema/data">

   <data:description short="Vowels Experiment">
description for data recorded during the vowels experiment.
   </data:description>


   <data:descriptor>
      <data:selector>
         <data:type code="0x0" qualifier="0x0" mask="0x0" />
      </data:selector>
      <data:description short="start/stop episodes" />
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x1" qualifier="0x0" mask="0x0" />
      </data:selector>
      <data:description short="recorded episodes">
episodes recorded from the brain of the animal during the experiment
      </data:description>
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x1" qualifier="0x1" />
         <data:type code="0x1" qualifier="0x2" />
         <data:type code="0x1" qualifier="0x3" />
         <data:type code="0x1" qualifier="0x4" />
         <data:type code="0x1" qualifier="0x5" />
         <data:type code="0x1" qualifier="0x6" />
      </data:selector>
      <data:description short="first electrode">
electrode position: 312312
electrode details: ...
      </data:description>
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x1" qualifier="0x7" />
         <data:type code="0x1" qualifier="0x8" />
         <data:type code="0x1" qualifier="0x9" />
         <data:type code="0x1" qualifier="0xA" />
         <data:type code="0x1" qualifier="0xB" />
         <data:type code="0x1" qualifier="0xC" />
      </data:selector>
      <data:description short="second electrode">
electrode position: 983247
electrode details: ...
      </data:description>
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x1" qualifier="0xD" />
         <data:type code="0x1" qualifier="0xE" />
         <data:type code="0x1" qualifier="0xF" />
         <data:type code="0x1" qualifier="0x10" />
         <data:type code="0x1" qualifier="0x11" />
         <data:type code="0x1" qualifier="0x12" />
      </data:selector>
      <data:description short="third electrode">
electrode position: 759829
electrode details: ...
      </data:description>
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x52" qualifier="0x0" mask="0x0" />
      </data:selector>
      <data:description short="stimulus emission">
type: sound
      </data:description>
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x52" qualifier="0x1" />
      </data:selector>
      <data:description short="correct GO">
type: sound
vowel: eh
behavior: go
      </data:description>
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x52" qualifier="0x2" />
      </data:selector>
      <data:description short="incorrect GO">
type: sound
vowel: ah
behavior: nogo
      </data:description>
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x52" qualifier="0x3" />
      </data:selector>
      <data:description short="correct NOGO">
type: sound
vowel: eh
behavior: nogo
      </data:description>
   </data:descriptor>

   <data:descriptor>
      <data:selector>
         <data:type code="0x52" qualifier="0x4" />
      </data:selector>
      <data:description short="incorrect NOGO">
type: sound
vowel: ah
behavior: go
      </data:description>
   </data:descriptor>

</data:set>

                    

3.2.4. Data bundle

[Important]Future feature

As of version 0.1.2, functionalities described in this section have not been fully implemented yet, but are already planned for future releases of YaTiSeWoBe. See Section 6.8.2, “Forseable releases” for more details.

For some reason, user might want to virtually join data that has been split and stored at several locations. Bundling files can be performed along time (consecutive files) or considered as having time running in parallel (simultaneous files).

Example 3.3. Consecutive data bundle

...

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<data:bundle xmlns:data="http://www.nhrg.org/schema/data">
   <data:consecutive>
      <data:source 
         url="file:///home/nhrg/data/2003/08/01.abe" 
         mime="application/x-abeles-format" />
      <data:source 
         url="file:///home/nhrg/data/2003/08/02.abe" 
         mime="application/x-abeles-format" />
      <data:source 
         url="file:///home/nhrg/data/2003/08/03.abe" 
         mime="application/x-abeles-format" />
   </data:consecutive>
</data:bundle> 

                    

Example 3.4. Simultaneous data bundle

...

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<data:bundle xmlns:data="http://www.nhrg.org/schema/data">
   <data:simultaneous>
      <data:source 
         url="file:///home/nhrg/data/2003/08/01/canal-01.abe" 
         mime="application/x-abeles-format" />
      <data:source 
         url="file:///home/nhrg/data/2003/08/01/canal-02.abe" 
         mime="application/x-abeles-format" />
      <data:source 
         url="file:///home/nhrg/data/2003/08/01/canal-03.abe" 
         mime="application/x-abeles-format" />
   </data:simultaneous>
</data:bundle> 

                    

Example 3.5. Complex data bundle

...

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<data:bundle xmlns:data="http://www.nhrg.org/schema/data">
   <data:consecutive>

      <data:simultaneous>

         <data:source 
            url="file:///home/nhrg/data/2003/08/01/canal-01.abe" 
            mime="application/x-abeles-format">
            <data:filter>
               <data:exclude>
                  <data:selector>
                     <data:type code="0x1" qualifier="0x1" />
                  </data:selector>
               </data:exclude>
            </data:filter>
         </data:source>

         <data:source 
            url="file:///home/nhrg/data/2003/08/01/canal-02.abe" 
            mime="application/x-abeles-format" />
      </data:simultaneous>

      <data:simultaneous>
         <data:source 
            url="file:///home/nhrg/data/2003/08/02/canal-01.abe" 
            mime="application/x-abeles-format" />
         <data:source 
            url="file:///home/nhrg/data/2003/08/02/canal-02.abe" 
            mime="application/x-abeles-format" />

         <data:filter>
            <data:exclude>
               <data:selector>
                  <data:type code="0x1" qualifier="0x1" mask="0x1" />
               </data:selector>
            </data:exclude>
         </data:filter>
      </data:simultaneous>

   </data:consecutive>

   <data:filter>
      <data:source url="file:///home/nhrg/filter/default.xml" 
          mime="application/x-nhrg-xml-filter" />
   </data:filter>
</data:bundle> 

                    

3.2.5. Data collection

[Important]Future feature

As of version 0.1.2, functionalities described in this section have not been fully implemented yet, but are already planned for future releases of YaTiSeWoBe. See Section 6.8.2, “Forseable releases” for more details.

Files grouped according to their meaning to browse them independently from their location, format, …

The way data collections help you organize and access your data is summarized in Figure 3.2, “Data collections usage overview” and examples of the XML format are provided as Example 3.6, “Simple data collections”, Example 3.7, “Organized data collections”, and Example 3.8, “Dynamic data collections”.

Figure 3.2. Data collections usage overview

Data collections usage overview

Accessing the data may require a dedicated software for authentication/authorization, or available freely available from an HTTP or FTP server.

Data and meta-information can be stored on a filesystem and/or in a database, as it is completely transparent for the data collection XML description.

[Tip]Tip

The following examples can be mixed in the same XML collection description file to fully benefit from the functionality.

Example 3.6. Simple data collections

This example shows how a simple list of data resources can be easily compiled into an XML file. Note that the list can gather resources from several origins (given by the url attribute), and potentially from different data formats (determined by the mime attribute).

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<data:collection 
   xmlns:data="http://www.nhrg.org/schema/data"
   name="NHRG 2004-02">

   <data:entry name="2004-02-01" 
      url="file:///home/nhrg/data/2004/02/2004-02-01.abe" 
      mime="application/x-abeles-format" />

   <data:entry name="2004-02-02" 
      url="http://www.nhrg.org/data/file/2004/04/2004-02-02.abe" 
      mime="application/x-abeles-format" />

   <data:entry name="2004-02-03" 
      url="ftp://www.nhrg.org/outgoing/data/2004/04/2004-02-03.abe" 
      mime="application/x-abeles-format" />

</data:collection>

                    

Example 3.7. Organized data collections

This example shows how a data collection can be composed by aggregating other collections, either explicitly or as included XML documents which location is provided by the url attribute.

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<data:collection 
   xmlns:data="http://www.nhrg.org/schema/data"
   name="My data files">

   <data:collection name="favorites">

      <data:entry name="sleepy" 
         url="file:///home/nhrg/data/2003/08/2003-08-01.abe" 
         mime="application/x-abeles-format" />

      <data:entry name="active" 
         url="file:///home/nhrg/data/2004/02/2004-02-01.abe" 
         mime="application/x-abeles-format" />

   </data:collection>

   <data:collection name="NHRG">
      <data:source 
         url="http://www.nhrg.org/data/collections/all.xml"
         mime="application/x-nhrg-xml-collection" />
   </data:collection>

</data:collection>

                    

Example 3.8. Dynamic data collections

This fake example also suggests that the sub-collection URLs are requesting a web program through the HTTP GET method to format the contents of a database into the XML format described here: the contents of the sub-collections are then dynamically constructed when requested. Note how all syntaxes can be mixed in the same description file.

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<data:collection 
   xmlns:data="http://www.nhrg.org/schema/data" 
   name="Shared resources">

   <data:collection name="by animal">
      <data:source 
         url="http://www.nhrg.org/data/collection?sort=animal" 
         mime="application/x-nhrg-xml-collection" />
   </data:collection>

   <data:collection name="by date">
      <data:source 
         url="http://www.nhrg.org/data/collection?sort=date"
         mime="application/x-nhrg-xml-collection" />
   </data:collection>

   <data:collection name="by experiment">
      <data:source 
         url="http://www.nhrg.org/data/collection?sort=experiment"
         mime="application/x-nhrg-xml-collection" />
   </data:collection>

</data:collection>