Wander
Completed
THIS
IS A DESCRIPTION OF THE FINAL 'WANDER' PORTABLE DATA COLLECTION SYSTEM
DEVELOPED FOR THIS NSF PROJECT BY STEVE ROBERTS. THE WRITE UP BELOW,
FIRST PRINTED IN THE EMBEDDED LINUX JOURNAL, MAY-JUNE, 2002 ISSUE
IS AN EXCELLENT SUMMARY OF HOW IT WAS ASSEMBLED AND CONFIGURED. --Dave
Hughes PI
ELJonline: WANDER: a Portable Linux Data-Collection
System
Steven K. Roberts and Ned Konz
One of the most entertaining aspects of spending an otherwise exhausting
decade conjuring a geeked-out, canoe-scale, Linux-based, amphibian
pedal/solar/sail trimaran, is that every new twist in the project
involves steep learning curves and, in many cases, spin-offs. Usually
these manifest themselves as publications and other obvious ways of
piping ideas back into the Open Source community that has done so
much to make the Microship adventure possible, but occasionally something
utterly unexpected falls out of the boat lab.
The WANDER Project certainly fits this category. A couple of years
ago, I was contacted by Dave Hughes of the NSF Wireless Field Test
Project and enjoined to "clone" the Microship core Linux system for
use as a ruggedized field data-collection tool. This seemed like an
easy and productive technology-transfer project, so I quickly agreed.
Naturally, it was not to be so simple; there was an almost immediate
divergence between the boat system design and that of the WANDER box.
The former was becoming more and more wrapped around a rich user interface
that could migrate transparently among wireless handhelds running
VNC clients, with applications ranging far beyond data collection
to include active control, security and communications. The latter,
meanwhile, was becoming ever more focused on the problem of deploying
a flexible database-centric tool into harsh environments, scriptable
by moderately technical end users, able to inhale readings from multiple
sensor channels, associate them with time and GPS coordinates, and
then eventually transmit accumulated data via Globalstar satellite
phone. It also would have to be power-efficient enough to allow unattended
solar operation, so WANDER took on a life of its own.
The WANDER system is built into a rugged Pelican case. External connectors
allow probe and solar connection, but the user interface requires
exposing the front panel.
The Essentials
We wanted to allow the user (typically a scientist doing environmental
field research) to install a variety of sensors and configure the
system accordingly -- a somewhat nontrivial problem, as we can't very
well anticipate every arcane serial protocol or sensor characteristic
that might be encountered. A data-collection process launches a collection
task for each channel, which in turn stores a time- and location-stamped
reading at specified intervals into a database (using Berkeley DB).
This process can be started and stopped manually, via a cron job or
under control of a separate microcontroller-based, power-control processor
that can wake the system at arbitrary intervals. An LCD display on
the front panel summarizes activity. All this can take place without
the connection of standard peripheral devices, although connectors
are included for keyboard, mouse and VGA display to simplify development
and maintenance. It is also possible to connect to the unit with an
Ethernet cable and gain full access via the LAN.
At any time, the database can be queried by means of a variety of
methods, including transmission of accumulated results via FTP over
the satellite link, sending same via e-mail or browsing through the
unit's internal web server (with tabular or graphic display). The
tools are standard, allowing researchers to create new utilities for
examining and manipulating the results; the whole front end is implemented
with a handful of CGI scripts, and all internals are written in Perl.
Having said all that, we also should note that this is primarily
a development system; it is relatively large and heavy, and operates
primarily through a browser interface. We envisioned the primary uses
as being field application development, feasibility tests for data-collection
systems, data concentration from other devices and a test platform
for software that is subsequently ported into miniature sealed systems
with wireless links to a host. Because it's all built on a standard
embedded Linux platform, code developed on WANDER should be portable
into tiny, cheap, field-deployable sensor nodes.
WANDER Hardware
We wrapped the system around an industrial-grade 133MHz Octagon PC-500
single-board computer with loads of I/O capability, then packaged
it inside a sealed Pelican case along with a battery management system,
hard disk, support for external Globalstar satellite phone, internal
Garmin-25 GPS with an antenna in the case lid, a simple menu-driven
local user interface and an Ethernet port that supports laptops or
LAN connection for detailed configuration or software development.
Survival in an outdoor environment defined the overall shape and
feel of this box; this called for a gasketed Pelican case and sealed
connectors. When the lid is closed, it can handle rain, dirt and high
ambient moisture -- although we wouldn't recommend total immersion
or extended operation in a saltwater environment.

With the front panel hinged open, the innards are revealed. The Octagon
PC-500 running Debian GNU/Linux is on the upper left; GPS and power
control are on lower left. Opening the box reveals a hinged silk-screened
panel, carrying a small Matrix Orbital LCD and a 20-button Grayhill
keypad, along with mini-DIN connectors for a PC keyboard and mouse,
auxiliary serial port, video display, external power input and Ethernet.
This panel in turn opens to reveal the internal hardware: the PC-500
card, 4.5GB IBM hard disk drive, a seven amp-hour sealed lead-acid
battery, a Calex DC/DC converter that generates five volts and the
custom power-management board. The latter is always alive and, in
addition to handling battery charging from the external Solarex photovoltaic
panel, it can send a brownout signal to the Linux board to allow graceful
shutdown and reawaken the board when power returns (with suitable
hysteresis to prevent flailing on and off, of course). This "power
control handshaking" also allows the data system to shut itself down
and schedule a return to life at any point in the future -- useful
for low-bandwidth data collection when power is scarce.
The Octagon PC-500 was chosen for this application because of its
substantial suite of I/O hooks with human-scale connectors (compared,
say, to a laptop board, which may be tempting for power-efficiency
reasons but is a major pain to hack). It is based on a 133MHz 5x86
CPU, with 48MB of EDO RAM, a Flash filesystem, support for M-Systems
Disk-On-Chip, APM-flavored power-saving options, floppy and hard disk
ports, SCSI-2, Ethernet interface, flat panel and SVGA support, and
efficient single-supply operation. The I/O includes five serial ports,
a normal PC parallel port plus 24 lines of configurable digital I/O,
and the endless variety of third-party options available via the PC/104
interface (this is not currently in use, but will become valuable
if WANDER users wish to add analog inputs, signal conditioning, speech
synthesis, relay outputs or whatever).
Now, let's take a look under the hood and see what it takes to make
WANDER dance.
WANDER Software
WANDER was built on a Debian "unstable" system with a 2.4.16 kernel.
LILO manages the boot process; there is also the choice of booting
to a DOS partition to manage some of the Octagon board settings.
Because there is 48MB of RAM available, we didn't have to be as concerned
about memory footprint as we would have been for a smaller system.
We were more concerned with making a system that is easy to customize
and extend. Although the Octagon board has a socket for a Disk-On-Chip
solid-state disk device, we decided not to use it because we needed
the hard disk anyway for data storage. Also, the Linux MTD drivers
didn't want to work with the DOC device on this board.
Before we discuss our database design, let's consider the basic data-collection
requirements.

WANDER Software Architecture
We need to be able to collect data simultaneously from a number of
different channels. Some of these may be periodic sources with a fixed
sampling rate (such as analog values). Other channels may provide
nonperiodic data, like text notes, images, audio samples and switch-closure
events. Both flavors of data are identified by a timestamp and channel
ID. The actual data can range from one byte to several megabytes,
and the timestamps require a one-second accuracy and resolution.
Our design depended on a single process storing the data and several
other processes querying the data. This required a storage scheme
that would allow a single writer and multiple readers to access the
database. We also wanted a way to discard old data if necessary, perhaps
after verifying its reception at a "home base" server via e-mail.
Thus, one of the first design decisions was how to store the sampled
data on disk so that we could get to it from multiple processes safely.
We considered a number of possibilities, from simple flat text files
through relational databases. The latter were rejected early on because
there are effectively no relations involved and because queries are
relatively simple (usually requests for values of certain channels
over a particular time range or for the latest value of a particular
channel). The relational approach would be overkill.
Flat text files on the other hand, while easy to implement, would
have been a pain to update. If a single such file were used for all
the channels, it would be hard to get the last values for each one,
and if multiple files (one-per-channel) were used, it would be time
consuming to query for a range of timestamps.
We finally settled on the Berkeley DB package. Berkeley DB databases
are dictionaries -- sorted collections of key/value pairs. The keys
and the values can each be up to 2Gb in length, which lets us store
everything from single numbers to images or text files in the database.
Because our view of the data is based on sample times, the keys in
the database are four-byte timestamps (with one-second resolution).
The values themselves begin with a two-byte channel number, followed
by the actual data, with numeric data stored as text. Using the Berkeley
DB Btree table type, we can then do efficient searches for ranges
of timestamps, as well as find the first or last ones quickly. Because
the package supports duplicate keys, we can store different channels'
data under the same timestamp.
For an embedded system, another advantage of Berkeley DB is that
it doesn't require a separate server process, keeping the memory requirements
low. It also handles the locking required by our single-writer, multiple-reader
scenario, using shared memory segments.
Because we didn't know where the future development of WANDER would
go, we wanted to make sure that the system was written so that it
could be extended easily and have new sensor types installed -- and
because the system would likely be used in university research, we
also wanted a language that was widely familiar to college students.
We thus chose Perl for our data collection and configuration programs.
Part of this choice was pragmatic: a number of the harder parts of
the job was already done for us by CPAN modules or extensible Perl
programs, including Berkeley DB interface (BerkeleyDB), event kernel
with timers and I/O triggering (Event), web server and system configuration
(Webmin), serial port control (Device::SerialPort), SMTP mail transmission
(Net::SMTP) and graph generation (Chart::Plot and GD).
Another reason for using Perl was its ability to evaluate program
snippets at runtime. We use this to provide each channel with a small
custom driver, which lets us add new channel types very easily from
within the Webmin environment. These drivers can be as small as one
line of Perl code.
At startup, the data collector reads a small Berkeley DB database
(separate from the collected data) that contains configuration information
for each channel. This configuration includes the name of a Perl script
that is then evaluated to provide the channel object used for collection.
The configuration data is available to these scripts as a dictionary
of name/value pairs and is user-extensible using the configuration
web interface.
The scripts that are evaluated for each channel give us a way to
customize the system for new sensors. All of the sensors in the WANDER
prototype were connected via serial ports, but future ones may require
the use of PC/104 hardware.
The periodic sampling itself is provided by the Perl Event module.
A given sensor may be notified upon a timer event, an I/O event or
both.
We provide several concrete base classes for common sensor configurations,
including the WaitingSerialChannel, which waits for data to become
available and uses a regular expression to extract values from serial
devices, and the PollingSerialChannel that wakes up periodically,
reads any available bytes from the serial port and uses a regular
expression to extract values.
Adding a new serial port-based sensor can be as easy as specifying
which port to use, the data rate and providing a regular expression
for parsing its data. Parentheses in the regular expression delimit
the data that gets stored in the database, but in some cases a single
serial port provides data for more than one channel. One example of
this is the GPS, which can provide latitude, longitude and altitude
information within the same once-per-second NMEA "sentence". In such
cases, additional sets of parentheses in the regular expression delimit
the data for the other channels.
Because the user can add multiple name/value pairs to the channel
configuration information from the web interface, custom setup data
can be added very easily and made available to the channel driver
scripts.
Of course, for all this to be useful, ultimately the collected data
must be transmitted to a central location. This is handled in the
WANDER prototype by sending the most recently collected data via e-mail
when a PPP connection is initiated via the Globalstar satellite phone.
An ifup script (invoked after the PPP connection is initiated) invokes
a Perl script that queries the database for samples collected after
the last e-mail, formats them into a text file and sends them to an
SMTP server.
A future improvement would be to delete already-sent data after an
e-mail acknowledgement. However, since most of the 4.5GB hard drive
is unused, all the data for a typical experiment can be stored on
disk if necessary.
For data-collection setup in the field, WANDER allows local viewing
of collected data via its Webmin web server. The user selects a time
range and channels of interest, and then views or downloads the collected
data as graphs of values vs. time, several channels overlaid on a
single graph or as separate graphs. Naturally, the data also can be
viewed or downloaded in spreadsheet-compatible CSV form.
The common user system administration tasks and data-collection setup
are managed by a web interface over the LAN connection. This web interface
is supplied by a web server and suite of CGI programs that come as
part of the Webmin package. All the system configuration that WANDER
might require, from network setup to software package management,
is handled by one of the Webmin modules. Webmin's web server also
serves reference and configuration help documents.
We added our own Webmin module for the WANDER-specific tasks of data-collection
configuration and control, and for viewing or exporting the collected
data. Perl was again the natural choice for writing this Webmin module
because Webmin itself is written in Perl and includes a support library
for module use.
Power Management
Because the WANDER system depends on a rechargeable battery, we had
to find a way to shut down the system cleanly before the battery got
discharged too far -- Linux doesn't take kindly to brownouts.
After discarding a couple of inadequate off-the-shelf solutions,
we designed and built a solar battery charger and power monitor board
using a Microchip PIC microcontroller to monitor battery and solar
panel voltages. It also monitors case temperature because the charging
voltages of a lead-acid battery are temperature-dependent.
The charger does the best it can to keep the system powered and the
battery properly managed (which is primarily about avoiding the twin
evils of overcharging or deep-discharging the sealed lead-acid battery).
This board is connected to the Octagon board using both a serial
port and a single digital status bit, an output from the charger board
that warns of impending shutdown. It has a second digital output that
connects to the DC/DC converter's remote ON/OFF input, so it can shut
down the power supply to the Octagon board, LCD and hard drive.
Normally, the serial port is owned and used by the data-collection
task to read the temperature inside the case while monitoring the
voltages of the battery, solar panel and the external analog input.
When the battery voltage gets too low, the power manager toggles the
status bit (connected to one of the auxiliary digital I/O lines of
the Octagon board), and a dæmon detects the change and tells the system
to start a graceful shutdown.
This simple "power handshaking" scheme offered a capability that
was just too tempting to resist: it's possible, during shutdown, for
the Linux board to instruct the charger to wake it back up in a certain
amount of time. This can be used when sampling intervals are far enough
apart to make it worthwhile to turn the computer off between samples,
particularly useful in a scarce-power environment.
If a timed startup is not chosen, the system automatically will be
restarted when the battery voltage gets high enough to stay alive
for a while. The voltage thresholds defining this hysteresis loop
can be changed using the serial port and are stored in EEPROM on the
board.
One of our major concerns in the WANDER design was power consumption.
Using the APM kernel module, we were able to slow the CPU during times
when the system was not actively processing. We didn't see any reason
to use the apmd dæmon. In addition, the noflushd dæmon shuts down
the hard drive motor after a period of inactivity and waits for a
disk read before it starts the drive motors again.
The APM shutdown function doesn't work because the system power supply
is a custom job, and the BIOS has no idea how to shut it off. To turn
off the power supply, we must send a message to the power monitor
board via its serial port.
User Interface
In normal operation, of course, there isn't a computer attached to
the LAN. The field user is likely to be more concerned with attaching
the sensors and solar panel to the external connectors and starting
data collection. For such everyday tasks, we added a small serial-interfaced
LCD panel, keypad and ON/OFF switch to the front panel -- doing serious
configuration or data analysis requires an external laptop (WANDER
has a static IP address but easily could run a DHCP server -- we left
this out to facilitate connection into existing LANs).
The Matrix Orbital 4 × 20 character LCD monitor and Grayhill 20-keypad
are handled by a separate Perl dæmon process. This can turn sampling
on and off, monitor the latest values from the channels being sampled,
display network activity or power subsystem status, or shut the system
down. The ON/OFF switch is only a sense input and is monitored by
the power-control/battery-charger board. When the user turns off the
power switch, the battery-charger board warns the Octagon board of
impending shutdown as if a brownout were imminent, and then waits
a minute for Linux to shut down gracefully. Then it shuts off the
5V power supply to the system and awaits the command to turn back
on.
Applications
We were pleased to observe a typical battery life of 16-18 hours
in normal operation and an overall system power budget that could
be supported indefinitely around the clock in moderately sunny conditions
with a 50-watt solar panel. Still, this is hardly the kind of thing
one would deploy in an unattended remote-sensing application; we see
it more as a tool for human-mediated environmental research as well
as a development system for ultra low-power standalone monitoring
tools.
The WANDER code base should port handily into a StrongARM (or similar)
embedded Linux board running in CompactFlash, allowing the deployment
of cheap, smart, low-power data-collection systems that play nicely
with standard network protocols. This is one of the major shortcomings
of most commercial products that purport to serve the same purpose:
they have the analog front-end and data-collection components well
refined but tend to require dedicated PC client software to disgorge
their contents reluctantly. WANDER, on the other hand, appears as
just another web server or scriptable data source that talks standard
FTP or e-mail protocols -- even from the boonies.
About the authors:
Steven K. Roberts is perhaps best known as the guy who wandered 17,000
miles around the US on a computer-laden recumbent bicycle during the
1980s. Since then, he has been taking entirely too long to build the
bike's successor, a networked amphibian pedal/solar/sail micro-trimaran
known as the Microship.
Ned Konz was writing robotics code in Smalltalk for semiconductor
factory tools but then escaped on his recumbent bicycle. He entertains
himself by designing microcontroller systems and programming in Squeak
Smalltalk, Perl and Ruby, and was the lead WANDER software designer.
He is also available for consulting work.
Copyright © 2002 Specialized Systems Consultants, Inc. All rights
reserved.
Embedded Linux Journal Online is a cooperative project of Embedded
Linux Journal and LinuxDevices.com.