Seabird is a common brand of CTD, an equipment designed to measure conductivity, temperature and depth on the ocean. From conductivity it's possible to calculate the water salinity, and from that you obtain a profile of the density of the water. Small changes in density are responsible for part of the ocean currents that flow around the world.
The Seabird processing software generates files with the cnv extension, containing a header with the metadata followed by the data as ASCII or in binary format. The file header can look like this:
* Some info
* Some kind of variable: with data
** Two stars: must really important
# but there's also hashes
# and keys here like = this
# file_type: binary
*END*
The data follows, either as packed 4-byte floats or column separated ASCII values. Pretty simple stuff.
This week I wrote a pydap plugin to serve Seabird files. The parsing code, which could be useful to other oceanographers using Python, looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 | header, data = content.split('*END*\r\n', 1)
# Process headers.
p = re.compile("""
(?:\*{1,2}|\#)\s # starts with "* " or "** " or "# "
(.*?) # everything until next token
\s*(?::|=|$)\s* # ": " or " = " or EOL
(.*) # all the rest, if any
""", re.VERBOSE)
attributes = (p.match(line).groups() for line in header.split('\r\n') if line.strip())
attributes = dict((k, lazy_eval(v.strip())) for (k, v) in attributes)
file_type = attributes.get('file_type', 'binary').lower()
nquan = attributes.pop('nquan')
if file_type == 'binary':
data = fromstring(data, 'f')
elif file_type == 'ascii':
data = fromstring(data, sep=' ')
data.shape = (-1, nquan)
|
The code works by splitting the metadata and the data at the *END* point. The headers are processed using a somewhat generous regular expression that looks for (key,value) pairs, leaving the value as an empty string when necessary. The function lazy_eval comes from pydap, and consists of a safe eval that returns only strings, numbers, lists and tuples.
The data handling is even more simple. We decode the data using numpy's fromstring, depending on if it's in ASCII or binary. The data is then reshaped, according to the number of variables described in the metadata. Pretty straightforward.