Appendix D. Data format

Appendix D. Data format
Prev		Next

Table of Contents

D.1. Data Structure of Spike Data Files in a Standard Format

D.1.1. Data Structure
D.1.2. Separators
D.1.3. Comments
D.1.4. Keywords
D.1.5. Codes
D.1.6. Checksum
D.1.7. Version and Titles
D.1.8. Analog Data
D.1.9. Synopsis

Internally, YaTiSeWoBe handles data through the API for class org.nhrg.data.Data as described in Section 6.7.1, “Data store”. Nevertheless, the application native format that YaTiSeWoBe understands has been proposed by Moshe Abeles in 1991. A partial and modified description of this format is included in the next section.

Data import is handled by modules that can be written for specific data formats, according to the description done in Step 2.

D.1. Data Structure of Spike Data Files in a Standard Format

Moshe Abeles

originally provided this information.

Department of Physiology
Faculty of Medicine
The Hebrew University

                    P.O. Box 12272
                    Jerusalem, 91120
                    Israel

9 aug 1991

D.1.1. Data Structure

The data is coded by triplets of numbers. Two numbers describe "EVENTS" The third describes the time of occurrence. The time is expressed as an interval from the previous event.

Example:

1 1 43 1 3 17 1 5 0 1 2 11 …

It would read: event no. 1,1 occurred 43 msec after the recording started, then event no. 1,3 occurred 17 msec later, then event no. 1,5 occurred within less then a msec after event no.3, then event no. 1,2 occurred 11 msec later etc…

Using mseconds as the units for time measurements can be changed, so that any other time units could be used. If nothing is said about time units they are assumed to be mseconds.

D.1.2. Separators

<carriage returns>,<line_feeds>, blanks , tabs are allowed between numbers in any desired combination.

Example: the data in the following LINE:

1 1 43 1 3 17 1 5 0 1 2 11

can also be written as:

OR: It is also allowed to use a comma (,) as a separator between numbers, as well as a combination of a comma and the other separators. Thus, the same data may be encoded by:

1,1,43 1,3,17 1,5,0 1,2,11
…

D.1.3. Comments

In addition comments can be interspersed anywhere by enclosing them in single quotes ('). For instance the exact data as in the previous examples may look like:

1,1,43 1,3,0 1,5,0
'This is a true coincidence between units 1,3 and 1,5'

D.1.4. Keywords

Some keywords have special meaning and they are put in double quotes ("). For instance the following file contains the same data as the previous one but coded in 100 useconds units:

"TIME_UNITS = 0.0001"
1,1,430 1,3,170 1,5,0
'This is a …'

D.1.5. Codes

The meaning of the first pair of number in each data item (triplet) is always a code for some event that happened. This might be a spike that fired, a stimulus that was presented, an analog voltage that was measured (as will be described later) or any other thing that the user wishes to define and mark its time of occurrence in the file.

It is advisable to attach some significance (taken from the experimental situation) to the two numbers that describe the events. For instance the first number may be used to describe the electrode number (or track number) and the second number may be used to describe the single unit recorded from that electrode (in that track). Or, the first number may be used to describe one parameter of the stimulus (such as its frequency), while the second number may be used to describe another parameter (such as its intensity).

In the discussion that follows we shell refer to the first number of the event pair as the event type and to the second number as the event qualifier.

Each of the two numbers in the event code is described by 1 to 4 hexadecimal digits (0,1,2,…9,A,B,C,D,E,F). The user can assign any code number he wishes to any event type (the first number), except for the numbers 0. Events having the form 0,nnn are reserved for special purposes as described below.

Event number 0,0 (which may be written also 00,0 , or 0,00 etc…) means the null event. It could be used to describe very long intervals or to mark the time at which a comment is recorded in the file. Two examples will illustrate these usages.

Suppose you wish to limit yourself so as to use at most two digits to specify the interval between successive events. This might be the case if you are using FORTRAN to generate your spike_data file and you wish to use a fixed format specification I2 for the times. What if you have occasionally more than 99 msec without any event? It would be a waste of space to code all the times in 3 digits if long periods of silence occurred rarely. You could use than the empty event (0,0) to describe this silence. For instance, a file that starts:

1,1,47 1,5,32 0,0,99 1,2,17 …

means that event no. 1,1 occurred 47 msec after we started to record, then event no. 1,5 occurred 32 msec, later and then event no. 1,2 occurred 116 (99+17) msec later.

Another use of the empty event is to mark in the file the point at which something which is not an event occurred. Suppose that you wish to record the time at which you inserted an important comment in the file. You may do it in the following way:

7,1,47 1,5,32 0,0,65

'at this point electrode no. 7 got dislocated from the cell' 1,2,11 …

This record will be interpreted as saying that 65 msec after event no. 1,5 occurred the comment was inserted, while event no. 1,2 occurred 76 msec (65+11) after event no. 1,5.

Event number 0,2 is used to state that data collection stopped in this point of time. Event number 0,1 means that data collection was resumed. These two codes are useful if the data in the file are not continuously collected, but made of many short runs of data streams (for instance if only activity around a stimulus is collected). If the first data triplet in the file is not 0,1,0 it is assumed that this code exists there. Thus two files that start:

3,1,167 …

, and

0,1,0 3,1,167 …

are equivalent.

Events 0,11,xxx 0,12,xxx 0,13,xxx are special events which are used when combining files. This may happen if the users wishes to take activity around a given stimulus from several files and make a new file which contains only the selected sections from the original files. This are not generally useful, but are included here for the sake of completeness.

Event number 0,11: Marks the start original file. It is used when a new file is generated from a number of old files. It gives the position at which the data from the new (original file) started. Typically it is followed by the name of the new file. e.g.

0,11,0
"TITLE(0) = 'v20s.022'"

Event number 0,12: Marks an end of original file. It is used when a new file is generated from a number of old files. It gives the time from the last event until the end of the original file. e.g.

0,12,323

Event number 0,13: defines long period with no events. Typically, when generating a new file from pieces of old files, it is used to fill the space in which there was data in the old files but we wish to ignore it in the new file. e.g.

0,13,5000

The code 0,FFFF is reserved to state the end of file. Note that while all other event qualifiers can have any number of (hexadecimal) digits the end of file code is specified as having 4 digits exactly. Anything that appears in the file after the time delay associated with 0,FFFF is ignored. If the end of file code (0,FFFF) is not preceded by the end of data collection code (0,2), it is assumed that data was collected up to the end of the file. That is, two files that end by:

… 3,1,67 0,FFFF,29

, and

… 3,1,67 0,2,29 0,FFFF,0

are equivalent.

A complete file of data may look like that:

"TIME_UNITS=0.001"
'1,n are spikes recorded through electrode no.1'
'3,n are spikes recorded through electrode no.3'
'A,01 is the onset of a 200 msec noise burst'
0,1,0
1,1,17 3,2,3 1,2,11 1,3,3 1,3,1 1,3,2 1,2,17
1,4,22 A,01,3 3,2,2 1,2,4 1,2,1 1,2,3 1,2,5
1,4,13
0,2,7 0,FFFF,0

This file reads as follows: In the first line we see that the units of time in this file are given in msec. The second, third and fourth lines are comments that reminds the experimenter what he was doing. Note that the programs that will analyze these data will ignore these comments. In the fifth line we see that data collection had started. Actual data starts in the sixth line and says that spike 1,1 fired 17 msec after we started the recording, spike 3,2 fired 20 (17+3) msec after we started the recording, spike 1,2 fired at time 31 (17+3+11) msec, etc… The next line tells us that spike 1,4 fired 22 msec after spike 1,2 (the last event on the previous line) and then after additional 3 msec a noise burst was sounded. Then we get some more spike data, and then the last line states that 7 msec after the last spike (1,4) we stopped the recording, and then this file ends. Note that as much as we need to know how much time elapsed from the beginning of the recording until the first event occurred, we also need to know how much time elapsed from the last recorded event until we stopped our measurements.

As is obvious from the foregoing description, the times of events are coded in decimal form (while the events were defined in hexadecimal form). There are two reasons for this seemingly awkward dichotomy. Although it would be advantageous to code all numbers in hexadecimal (because this would save some space), the user of these files might find hexadecimal times hard to decipher, especially if he wanted to run some checks on the programs that analyze the data by computing some sample data by hand and comparing the results with the computer results. A more important reason for not coding times in hexadecimal is the danger of interpreting an FFFF time delay as an end_of_file if an extra comma(,) is inserted into the file or if one of the numbers is dropped during communication. By reserving the hexadecimal codes to events we reduce somewhat this danger. On the other hand events could be coded in many ways so that there is no special advantage in using decimal notations.

To this basic scheme we have to add few features that will deal with error detections, general management, and including analog data in our file.

D.1.6. Checksum

At first we shall deal with questions of error detection. One of the main reasons for coding spike data in ASCII is portability of such files between computers. However if one tries to transmit such files (which are bound to be very long) through serial lines, errors are likely to occur. Some communication programs have means to detect (or even correct) such errors. However in most cases, particularly if we think of exchanging files between different kinds of computers, this is not available. We introduce ,therefore, some means to assure the integrity of our data. One of the most simple ways to test for integrity is to include a 'checksum' every now and then in the data. In our case we do the following: Starting from the beginning of the file we add up all the values of the characters to each other in 16 bits. We do not include in the checksum the end_of_line codes, the blanks, the tabs and the comments. The reasons for not including end_of_lines, blanks and tabs is that different I/O systems treat these characters differently and we do not wish to generate errors because of these different treatments. The reason for not including comments will become apparent shortly. If the added values overflow (i.e. they become bigger then FFFF hexadecimal) we take the remaining value and keep adding the characters. Every now and then we include this computed checksum in our file by inserting:

"CHKSM = value"

Where value is the hexadecimal representation of the computed checksum coded in ASCII. To fix our ideas let us look at a very small example, assume that our file starts:

1,1,4 1,2,17
…

and we want to include there the checksum. We proceed in the following way: The file starts with several blanks. These are ignored. Note that some I/O system will drop out the first character in a line (assuming that it is a carriage control of a printer), it is therefore wise to start every line with at least one blank. The first non blank character is 1 whose hexadecimal code is 31, so we assign the value of 31 to CHKSM, then we have a comma whose value in ASCII is 2C (hexadecimal), we add it to the checksum to get 5D. Then we have another 1 whose ASCII value is 31 (hexadecimal), therefore we add 31 to CHKSM then we add 2C (for the next comma) then 34 (for 4). The two blanks that follow the 4 are ignored, then we add 31 (for the next 1), and so on until we reach the end of the line. We ignore the end-of-line character because some I/O system will use <carriage- return> to designate end-of-line, some will use <line-feed> and some will use both.

Assume that we wish to put the CHKSM here. We could just insert an additional line giving the checksum. We would get:

1,1,4 1,2,17
"CHKSM = 211"
…

Note that the characters in the line which states the checksum are within a comment and therefore not included in the computation. If we wanted to include also the "CHKSM=hhh" text in the checksum we would run into some complex computational problems. This is one of the reasons for not including comments in checksums. The other reason is to enable the owner of the file to add later any number of comments without having to worry about updating the checksums.

Of course after writing down the checksum in the file we restart to calculate it from the first character in the next line.

The places at which the checksums are placed are arbitrary and they can be put as frequently as the quality of the communication line dictates.

In the future error correcting codes may be added to this scheme.

D.1.7. Version and Titles

Management codes include (in addition to CHKSM) the version number of the code, and the titles. Version number should appear as the first item in the file. It informs the interpreting procedures what kind of statements they can expect to find in the file. The version described here is version 0. Therefore any data file that adheres to the standards proposed here should start with:

"VERSION = 0"

If the "VERSION" clause is not included version 0 is assumed. Titles are a special class of privileged comments. They are inserted into the file by:

"TITLE = 'any text you wish to type'"

or by

"TITLE(n) = 'some other important comment'"

where n can be any decimal number from 0 upward. When the (n) suffix is not added it is understood that this is title number 0. The text of the title is enclosed in single quotes, while the entire title is enclosed in double quotes. The text within the single quotes can span several lines.

The purpose of inserting comments into the data file by the TITLE keyword is to enable you, later on, to specify which titles will be included in the histograms (or other graphs) that will be computed from your data. For instance you may wish to include information about date, track number, and stimulus conditions in such titles. Your data file may look like:

"VERSION = 0"
"TITLE(0) = '12/12/85'"
"TITLE(1) = 'Track III'"
"TITLE(2) = 'moving grating

at 5 deg/sec'"

The analyzing programs should enable you to state later which titles you wish to include with the results of the analysis of the data.

D.1.8. Analog Data

Analog data should be avoided as far as possible. If one wishes to record an EMG for deciding when a certain muscle starts to move it is advisable that he decides when the muscle starts to move before preparing the data file, assign an event code to that initiation of movement and record in the file only the times of movement initiation (and not the EMG itself). When this is not possible it is advisable to use only occasionally analog data. For instance if one is interested in saccade (rapid shifts of gaze), it would be wise to give a code to the event of the saccade and to include in the file only the fact that a saccade occurred (with its time) and to follow this code by two analog codes specifying the eye position coordinates to which the eyes moved.

When this is not possible, and the experiments calls for "continuous" sampling of analog data, one should attempt to include only the pieces of data which are of interest (e.g. scalp potentials around the time of a stimulus) and not the entire, uninterrupted, stream of samples. One should always bear in mind that long sequences of sampled, analog, data are bound to increase enormously the size of the file.

Analog data are recorded too by triplets of numbers. The first number (the event type) identifies the channel from which data was recorded, the second number (the event qualifier) identifies the value of the sampled data. If the second number is negative it has to be put as the number that complements it to 10,000 hexa (e.g -1 will be written FFFF). Because of this difference the file must state all the codes which will be used for analog signals.

For example the following lines:

"TIME_UNITS=0.001"
"ANALOG = A1"
"ANALOG = A2"

states that event types A1 and A2 are for analog recordings. The third number is the time interval like usual.

Let us look at the following file which includes both analog and spike data:

"VERSION=0"
"TIME_UNITS=0.001"
"ANALOG=A1"
"ANALOG_UNITS(A1)=0.000001"
'event 1,1 is a code for a spike'
'event A1 is a code for EEG recording'
0,1,0 1,1,72 1,1,49
A1,24,17 A1,2,5 A1,FFE0,5 1,1,3 A1,FFC4,2 …

After the initialization and comments we read the following data: spike 1,1 fired 72 msec after we started the recording, then 49 msec later it fired again, then after 17 msec we started to sample our A1 analog input (at a rate of every 5 msec). The value of the first sample (sampled at time 17 msec after spike 1,1 fired) was 24 u_volts, 5 msec later we sampled again and got 2 u_volts, 5 msec later we sampled again and got -20 u_volts, 3 msec after this last sample spike 1,1 fired again, and then 5msec after the previous sample (i.e. 2 msec after spike 1,1) we sampled again and got -3B u_volts.

Note that since we made all the event qualifiers hexadecimal, the voltage of the analog channels is also in hexadecimal.

D.1.9. Synopsis

The Data_file is made of a list of constants. A constant is a string of hexadecimal (or decimal) digits or a string of text enclosed by single(') or double (") quotes. Constants are separated from each other by separators. Separators are blanks,tabs, end-of-line codes, or any combination of the above, or a comma(,), or a combination of a comma and any of the other separators.

The information in the Data_File is taken to be of two major types: Events and Comments.

Events are made of triplets of numeral constants, the first two of which are coded in hexadecimal and the third in decimal.

Comments are strings of text which are used for remarks and for defining parameters and terms to the analyzing programs.

D.1.9.1. Events

Each Event is composed of three numbers two of which are the event code and the third one is the event time.

The event code is made up by two hexadecimal numbers. The first number specifies the events type, while the second number is the events qualifier.

Event types have three meanings: Event type 0 is a control type. All other values are either point events or analog channel numbers. An analog channel number is any hexadecimal constant hhhh that has been declared to be an analog channel by: "ANALOG = hhhh". Event types which are not 0 and are not declared as analog channels are point events.

The control event may have 4 event qualifiers (0,1,2 and FFFF), which have the following significance:

0,0: means an empty event,
0,FFFF: means the end of file,
0,1: means the recording has started,
0,2: means the recording was stopped.

The event qualifier that follows an analog event is its analog value, coded in hexadecimal form. The number that follows a point event code is a qualifier for that code that may be used to describe the code in more details.

D.1.9.2. Comments

Are recognized by being embedded in quotes (single or double). Comments that appear in between single (') quotes are not interpreted by any data processing programs. Comments that appear between double quotes (") must be of the form "KEYWORD = VALUE". They are used to control the operation of the interpreting programs.

Recognized KEYWORDS are:

VERSION: To define the version no. of the file.
TIME_UNITS: A scaling factor that converts all the time information given later into seconds.
ANALOG: To define analog events.
ANALOG_UNITS(xx): A scaling factor that converts all the analog values given later for channel xx into volts.
CHKSM: To specify a 16 bit checksum.
TITLE(n): Any string that may be used later as a title for the display of results computed from the data in this file.
VERSION: must appear at the start of a file.
ANALOG = xx: must precede the "ANALOG_UNITS(xx) = yy" comment.

The CHKSM is computed by adding together the ASCII codes (evaluated in hexadecimal) for all the characters in the file except: blanks,tabs,end-of-lines and the comments.

The definitions given here are for "VERSION = 0".

Prev	Up	Next
DataFilter	Home	Appendix E. Licenses