[Lilug] Why GPSes suck, and what to do about it

Justin Dearing zippy1981 at gmail.com
Mon Feb 23 05:24:59 PST 2009


An interesting read from RMSes blog.

Apologies for the html to the mutt crowd, its what google reader sends
out.

Sent to you by Justin Dearing via Google Reader: Why GPSes suck, and
what to do about it via Armed and Dangerous by esr on 2/23/09
I’m the lead of the GPSD project, a service daemon that monitors GPS
receivers on serial or USB ports and provides TPV
(time-position-velocity) reports in a simple format on on a well-known
Internet port. GPSD makes this job looks easy. But it’s not — oh, it’s
decidedly not — and thereby hangs an entertaining tale of hacker
ingenuity versus multiple layers of suck.



Away back in the dark and backward abysm of time when GPS technology
was first being made generally available (e.g., 1993), only
military-grade receivers were sensitive enough to use it where there
were things like buildings and trees partly blocking the sky view. The
first civilian customers to actually find a use for it were people
messing about in boats. Thus it came to pass that the manufacturers of
marine navigation systems were the first civilians to grapple with the
question of how a GPS receiver should report TPV information over a
wire to a navigational computer.

Our first layer of suck begins with the National Marine Electronics
Association, or NMEA. They wrote a standard describing a protocol for
GPSes reporting over serial ports called NMEA 0183 which, despite
being a technical expert in the field, I’ve never dared to look at. The
reason is that they made it proprietary and expensive, and their
lawyers have been known to threaten legal action against people who
quote it on the net.

To add injury to insult, NMEA 0183 was (and still is) a crappy
standard. How crappy? Well, before I get into that, let’s note that
there is one thing NMEA did right that later attempts to replace it got
wrong. Each NMEA report is a text packet, or sentence, that begins with
a dollar sign and ends with a carriage-return and line feed. The data
elements in in NMEA sentences are just text fields separated by commas,
like this:
$GPRMC,225446.33,A,4916.45,N,12311.12,W,000.5,054.7,191194,020.3,E,A*68
This means that log files of collected NMEA sentences are easy to read
and edit. And that number on the right-hand end, after the “*” but
before the CRLF? A data checksum, so you can tell whether you have a
valid sentence or just line noise (and this is important: we’ll come
back to it later). A GPS speaking NMEA emits sentences like this onto
the wire, usually in once-per-second bursts.

The first layer of suck actually begins with what NMEA 0183 has you put
in those packets. If you are a mathematician, you have a pretty good
notion of what a TPV report is. It’s a 7-tuple describing your position
in four dimensions and your velocity in three. If you are an engineer
or the more practical sort of physicist, you want to add expected-error
estimates at some fixed confidence level, usually 50% or 95%, and
return 14 numbers .

Internally, this is what a GPS sensor computes from the signal times to
GPS satellites. Actually, to be pedantic, it doesn’t compute the error
bars in exactly this form; rather, you get scale factors for the errors
derived from the geometry of the satellites when the fix was taken, and
have to multiply that by an experimentally-derived bugger factor
dependent on things like how turbulent the radio-reflecting layer in
the ionosphere is.

Now let’s look at what NMEA 0183 tells GPS devices to actually report.
Here is a breakdown of the data in our sample sentence, which is in
fact the most commonly used GPS reporting format for TPV:
1 225446.33 Time of fix 22:54:46 UTC
2 A Status of Fix: A = Autonomous, valid;
D = Differential, valid; V = invalid
3,4 4916.45,N Latitude 49 deg. 16.45 min North
5,6 12311.12,W Longitude 123 deg. 11.12 min West
7 000.5 Speed over ground, Knots
8 054.7 Course Made Good, True north
9 181194 Date of fix 18 November 1994
10,11 020.3,E Magnetic variation 20.3 deg East
12 A FAA mode indicator (NMEA 2.3 and later)
A=autonomous, D=differential, E=Estimated,
N=not valid, S=Simulator, M=Manual input mode
13 *68 mandatory NMEA checksum
Alert readers will notice what’s missing here. Altitude, for starters —
we’ve got no Z! People in boats, remember? They think they don’t need
no steenking altitude. And no error estimates at all. And the T
report is incomplete, giving only a two-digit year. Yup, that one got
annoying real fast when the millennium turned. And it’s not like the
designers couldn’t see that coming in 1993.

Eventually, NMEA wised up about the altitude thing. The sane way to
proceed would have been to define a new sentence containing all the
GPRMC information, plus altitude, plus a real four-digit year, even if
error bars had to remain suppressed for some inexplicable reason.
Here’s what we got instead:
$GPGGA,123519,4807.038,N,01131.324,E,1,08,0.9,545.4,M,46.9,M, , *42
1 123519 Fix taken at 12:35:19 UTC
2,3 4807.038,N Latitude 48 deg 07.038′ N
4,5 01131.324,E Longitude 11 deg 31.324′ E
6 1 Fix quality: 0 = invalid, 1 = GPS, 2 = DGPS,
3=PPS (Precise Position Service),
4=RTK (Real Time Kinematic) with fixed integers,
5=Float RTK, 6=Estimated, 7=Manual, 8=Simulator
7 08 Number of satellites being tracked
8 0.9 HDOP = Horizontal dilution of position
9,10 545.4,M Altitude, Metres above mean sea level
11,12 46.9,M Height of geoid (mean sea level) above WGS84
ellipsoid, in Meters
Now we’ve got X, Y, and Z…but T is even more damaged! You get a time of
day, no month, no year, no century. No velocity report at all. We’ve
got one number, HDOP, that tangles EDX and EDY together to give a
circular horizontal error. And despite the fact that this sentence
reports an altitude (Z), there’s an EDX/EDY and no report of EDZ!

For some inexplicable reason, NMEA also describes a GPGLL sentence that
has all the brain-damage of GPGGA, but without the altitude. And a
GPVTG that gives only a velocity report - no position, and naturally no
error bars. Do I need to add that both have missing our incomplete
timestamps? And oh, yes, there are actually two different incompatible
variants of GPVTG.

Remember I said GPS receivers emit bursts of NMEA packets once a
second? Well, the bursts typically consist of a GPRMC, followed by a
GGA, possibly followed by a GPGLL and/or GPVTG. Er, no, I’m lying,
they could be in a different order. The sentences in the burst have
overlapping, incomplete information. The NMEA standard doesn’t specify
even which ones must be sent, let alone the order they’re sent in.

Some NMEA GPSes part-repair the timestamp damage by shipping a sentence
called GPZDA that gives you a full UTC timestamp with century. But the
standard doesn’t require it, and most don’t, so you can’t count on it.

The first layer of suck was about what NMEA 0183 specifies. We are now
passing into the second layer of suck, which is what it doesn’t
specify. Like, the minimum set of sentences that have to be sent per
reporting cycle. Oh, and nothing in the standard stops a GPS from
simply omitting fields it doesn’t feel like reporting. It’s fairly
common, for example, for receivers to not report magnetic variation or
geoid separation (the geoid is an imaginary surface representing the
difference between mean sea level and local sea level, which varies
because the earth’s mass is not uniformly distributed.). GPS designers
can save some absurdly tiny fraction of a penny per unit by not having
these data tables in ROM, and they’re generally more than willing to
shaft their customers to do it.

A mob of crack-smoking rhesus monkeys could have designed a better
standard than NMEA 0183. It means that if you want to assemble a proper
TPV report from NMEA sentences, you actually need to wait until you’ve
seen an entire reporting cycle. Only…you can’t tell without knowing the
type and firmware version of the GPS which sentences start and end the
cycle! And even if you did know, buffering the partial data introduces
latency that may be unacceptable for some applications.

A very practical way this manifests is that if you have a GPS client
faithfully reporting the NMEA sentences coming over the wire, your
altitude will typically flicker from known to unknown and back twice a
second as it gets hit by alternating GPRMC and GPGGA sentences. That
is, unless you buffer, in which case the altitude you see could be up
to one second stale and associated with a previous fix.

The incomplete timestamps mean various sorts of lossage can bite you if
you have a GPS client active at midnight. Unless your GPS is actually
watching for the moment when the GGA timestamp goes to
00:00:00 and can compensate, it’s going to look like you’ve dropped
back in time 24 hours until the GPRMC next comes in. Human eyes can
just reject this, but what if you’re logging telemetry and try to
graph against time? Similar anomalies lurk at the edges of years and
centuries.

Yes, and if you want to report true altitude over ground correctly and
consistently across devices, you better have your own geoidal
separation table in software somewhere.

And I have nowhere near plumbed the stygian depths of the NMEA
standard’s two layers of suck. To spare the reader’s sanity, we shall
lightly draw a veil over the spiky, vague, ill-documented horror that
is NMEA error and status reporting and pass directly to the third layer
of NMEA suck, the complete absence of any standardization of GPS
control codes.

Here are some of the more important things there is no NMEA-standard
way to tell a GPS to do:

- Report its vendor, model, and firmware version.
- Change the set of sentences it ships per cycle.
- Change the baud rate at which it reports.
- Change the number of samples it reports per second.
Of these, (1) is the most harmless-looking, but actually the deadliest.
Many GPSes have vendor-defined commands to do (2) and (3), but it is
far from trivial to figure out which set of vendor-defined commands
might apply. If you are a GPS-using application, and you are handed the
name of a port with a GPS on it, you have to either settle for the
minimum common subset of GPS behaviors, or throw all the
vendor-specific ID probes you know of at the device hoping it will
respond to one of them. Hint: too often, it won’t.

But wait. Things get worse!

There are, broadly speaking, three major different ways that GPS
vendors could have responed to the admitted fact that NMEA 0183, as
given, is a festering pile of rancid camel vomit.

- They could have pressured NMEA into cleaning up the damn standard.
- They could have de-facto standardized on a decent set of extensions
using the NMEA sentence packet format - a sentence that reports all 14
location parameters, a probe-for-ID query, a standard baud-rate change,
etc.
- Or…they could invent a dozen mutually incompatible and poorly
documented proprietary binary protocols, all of which throw away the
transparency advantages of the NMEA textual packet format and each one
of which introduces unique and special brain-damage of its own!
Guess which alternative most of them chose. Just guess…

You are now in a twisty maze of GPS reporting protocols, all different.
Many devices have two different operating modes, one in which they emit
NMEA packets and one in which they emit a vendor binary protocol that
looks like nothing so much as line noise. At least one major vendor has
dropped NMEA support entirely. If your location-sensitive application
is naively expecting NMEA, you lose.

To be fair, one things the vendor binary protocols generally get right
that NMEA doesn’t is shipping something close to a full TPV in one
sentence per cycle. This at least avoids the nasty problems associated
with integrating partial NMEA reports and worries about where the start
of cycle is. However, I had to say “something close”; not one protocol
ships the full and correct TPV 14-tuple. Usually one or more velocity
components and error estimates are missing and have to be computed.

Let’s back off at this point and consider how people who use GPS
sensors would, ideally, like their GPS sensors to behave. You plug it
in, your software figures out what protocol it’s using, autoconfigures
to match it, and starts collecting PVT reports and using them.

If “your software” is a GPSD-enabled application on a system with gpsd
installed, it actually works this way. Those of you who have been
following our descent into this fourth major layer of suck can be
excused for wondering how in the flipping hell GPSD ever managed this
trick in the twisty maze of vendor protocols, all different.

Certainly the vendors aren’t being much help here. Many of them (’m
looking at you, Garmin!) are cheerfully willing to assume that you will
never use anything but their one idiosyncratic piece of GPS hardware,
and that it will only talk to a limited, vendor-controlled selection of
closed-source binary blobs provided by them or their business partners.
Hello, vendor lock-in; goodbye, customer choice.

There is an Ariadne’s thread through this maze. It’s this: All the
vendor protocols, like NMEA 0183, use packets with checksums and fixed
header/trailer bytes. The intention is that they’re an integrity check
so you don’t get fooled by line-noise-induced glitches. The side effect
is that, if you’re sufficiently clever, you can do GPS protocol
autodetection on the fly. It takes a fairly complex state machine that
tangles together structural knowledge about every packet protocol in
your supported set, but it can be done. In GPSD-land we call this piece
of code the packet sniffer.

There’s something else the packet-sniffer does: it autobauds. Again,
this is only possible because packet checksumming gives you a way to
know for sure when you’re looking at valid data. When a serial GPS
device is presented to gpsd, the packet sniffer doesn’t have to be told
the baud rate the device is shipping at - it cycles through all
possible combinations of speed, parity and stop bits looking for a
combination under which it sees valid packets of some type. Normally
this takes less than a second.

The packet sniffer is the real reason for the existence of gpsd — and
I’d add “other programs like it”, except that there aren’t any others
that I know of. Long ago, all the gpsd daemon did was serve as a
multiplexer that read and buffered TPV reports from a single serial
device so that several GPSD-aware applications could get simultaneous
access to them. That’s all GPSD’s closest competitors today, like
Gypsy, can do; they’re NMEA multiplexers. They typically can’t cope
with non-NMEA devices at all; no packet sniffer.

The gpsd daemon also copes with the data management problems
surrounding NMEA partial TPV reports, doing everything from supplying
missing geoidal separation for altitude to computing and reporting
error estimates from the geometry of the satellite skyview if the GPS
doesn’t supply them.

Most of the suck surrounding GPSes can be summed up by “all this
cleverness is actually necessary if you want to get clean TPV data out
of more than one different kind of device”, or even out of just one
kind of device that fails to supply a complete TPV. And, as we’ve
noted, all of them fail in a dizzying variety of ways.

It’s true that in theory, every single GPS-aware application could
include its own packet sniffer, the matrix algebra needed to compute
missing error estimates, its own geoidal separation table, and all the
other random logic needed to cope even with GPSes that are working
nominally correctly. But have we mentioned yet that some…don’t? We know
of at least three circumstances under which popular GPS chipsets return
un-obviously corrupted NMEA - detectable, but you actually have to know
how. Then there’s one chipset we know of that returns incorrect packet
checksums when it doesn’t have a fix.

And we’re not done yet, because there at least two other sets of issues
about extracting sense from these devices. One set is an artifact of
the way USB GPSes are put together. Now, USB is generally a good thing
in this context; unlike old-school serial ports, USB devices raise
notification events on connect and disconnect, which clever GPS
software can listen for and use to automatically hook up and sync to
GPS sensors when they’re available.

However…naked GPS chips report serial data at TTL levels. The standard
way to build a USB GPS is to hook up your GPS chip with a serial-to-USB
bridge; there’s one in particular called a Pacific Logic 2303 that
tends to show up on about 70% of the USB GPSes out there. There are two
problems with this kind of design, one obvious and one subtle.

The subtle one is that both the bridge and the UART on the GPS chip
have their own data buffers. Under most circumstances this doesn’t
matter because the introduced latency from both together is very small
- but some control operations (notably the serial-speed changes you’re
going to be doing while you try to sync up with the device) need an
amount of delay sufficient to flush both, otherwise you get odd race
conditions that can result in garbage data coming back up the wire or
your control operation silently failing.

The right combinations of OS-level buffer flushes and delays will avoid
this problem, but clumsy ways to do it cause fix latency and
application slowdown. The comment explaining these issues in the GPSD
code leads off with “Serious black magic begins here.” and continues
for 48 lines — because it needs to.

(This isn’t the worst thing you have to be careful of while hunting,
though. Some Bluetooth GPS with defective firmware will actually go so
badly catatonic if you try to change their baud rate that you
actually have to crack the case and unsolder the battery to unbrick
them!).

Here’s the more obvious USB problem: there is no USB device class for
GPSes. A USB GPS will present the vendor/product ID of the
serial-to-USB converter. This means that, even if you’re fortunate
enough to have an operating system that can do something reasonable
with hotplug events, you can’t just tell it to watch for GPS devices
going live and connect them to your software; you have to know which
bridge-chipset IDs are likely to have GPSes behind them, sniff the
data, and let go of the device if it’s not shipping GPS packets!
Otherwise you might eat events from non-GPS serial devices that some
other application badly needs to see.

And so on, and so on. Dealing with all this crap is further complicated
by vendor documentation that is scanty if you can get it at all, and
often written in rather broken English when you can. Part of of the
problem is the structure of the GPS sensor market, which largely
consists of dozens of tiny Pacific-Rim companies - each popping up out
of nowhere, shipping the cheapest possible spin on one of about a
half-dozen reference designs, and disappearing six to eighteen months
later.

A friend who works in embedded systems tells me these little outfits
aren’t even intended to last long; they’re actually run by giant
electronics combines through several layers of shell companies as a
way of providing deniability in case of lawsuits by patent trolls. They
spin up, they ship, they funnel money back to daddy…and the second a
process-server shows up they disappear. All the engineers
get rehired by a different sock puppet a week later. Lather, rinse,
repeat. And…er…product support? What’s that?

It’s messy. Really messy. Those who love the law, sausage, or GPS
devices really shouldn’t watch any of them being made.

Expecting GPS-aware applications to keep track of all this stuff would
be just nuts. The best way to cope is to have a dedicated service layer
that specializes in knowing about GPS idiosyncracies, hides all
that ugliness, and presents a simple TPV-reporting interface to the
application layer above. Ideally, the service layer should have a sharp
crew of developers who are specialist GPS experts so that nobody else
has to be.

And that’s exactly what the GPSD project is. It looks like a simple
job…but it’s not.



Things you can do from here:
- Subscribe to Armed and Dangerous using Google Reader
- Get started using Google Reader to easily keep up with all your
favorite sites
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lilug.org/pipermail/lilug-lilug.org/attachments/20090223/8c08f60f/attachment.htm>


More information about the Lilug mailing list