Craig Ulmer

Data Warehouse Becomes FAODEL

2018-01-12 faodel

It's official: I'm renaming my main project at work to FAODEL: Flexible, asynchronous, data-object exchange libraries. FAODEL (pronounced fā-ō-del) comes from a simplification of the Gaelic term faodhail, which is a land bridge used to cross between islands. Here are two examples between the Monach Islands in Scotland:


Nessie, Kelpie, and Scottish Names

My main project at work for the last few years has been writing data management services for HPC applications. Sandia's I/O group previously built an RDMA portability layer called Nessie to support the Lightweight File System (LFS). I initially built a key/value store on top of Nessie. I wanted to keep the Scottish monster theme going, so I decided to call it Kelpie (a kelpie is a sea horse in Scottish mythology that drags people to a watery death). As Kelpie evolved we started adding more packages with Scottish/Gaelic terms. We named our memory manager Lunasa (a Gaelic harvest festival) and our boot services became Gutties (a cheap gym shoe in Scotland). It didn't take long for us to realize that there were a lot of issues with using Scottish/Gaelic terms to name things. First, the words are often difficult to spell and pronounce. Second, we've had trouble finding other Scottish mythical beasts we could swipe. And finally, it seems like every Scottish word has a slang meaning that would make us hesitant to use it at a conference. As such, when we talk about our different software packages, we've been referring to them by our project name, which is "Data Warehouse".

ATDM "Data Warehouse" Origins and Problems

Three years ago the labs realized that they needed to do something different if they wanted to be able to have their codes scale up to exascale computing platforms. The ATDM project was formed to develop new software infrastructure that would allow new codes to achieve better performance than MPI-based approaches. The main idea was to use overdecomposition and task-dag programming models (aka, "asynchronous many-task" or AMT) to overcome dynamic load balancing problems while improving developer productivity. Existing frameworks (e.g., Charm++, Legion) didn't fully meet our requirements, so the DARMA team set about building a new AMT API that leveraged modern C++ features and could be retargeted to run on top of different runtimes (eg, DARMA on Charm++). From an I/O perspective applications needed a way to allow dynamically-placed tasks to exchange data with existing mesh databases and storage tools (all of which are built on static distributions). Our project was started to serve as a way to manage AMT's data in this context. Given that other AMT's used the term "Data Warehouse" for their storage, we became ATDM's "Data Warehouse".

The problem with the term Data Warehouse is that it has a specific meaning for I/O people. In the 1970's Bill Inmon started using Data Warehouse to refer to the idea of centrally storing/indexing all of an organizations data, instead of spreading it out among many smaller databases. Inmon has written articles that point out how NoSQL people have hijacked his term (which I agree with), so I've always cringed at having to refer to ourselves as ATDM's DW. It's difficult to change a funded program's name though, once it's on the books.

Faodail and Faodhail

Recently, we've been reorganizing our code so that we can go through the official open-source release process. We've generalized our scope so we can serve more than just DARMA, so I decided it was a good time to revisit project names. I found the name "Faodail", which people say is a "lucky find, usually of a lost item". That seemed like a good fit for a key/blob service, so we started using it (it even made its way into a paper an intern wrote). There were a few problems with faodail though: (1) we found it was difficult to pronounce, (2) the internet already had some references to it, (3) taking the "aod" out of "faodail" makes it fail, and (4) google translates faodail to "maiden" (??). It seemed like a bad idea.

I went back to the naming game and searched some more. After a lot of misses (and general disgust from my team), I noticed in WikiSource that the term right after faodail is "faodhail". The definition for faodhail is:

faodhail, ford, a narrow channel fordable at low water, a hollow in the sand
          retaining tide water: from N. vaðill, a shallow, a place where straits
          can be crossed, Shet vaadle, Eng. wade.

Looking around more, I found maps of different faodhails around Scotland. These are regions where the tides go out and leave a land bridge that lets you cross between the islands. This meaning is perfect for what our software does: we build communication libraries that let you move data between different application islands.


Choosing FAODEL

Faodhail was longer than Faodail and had the same problems. I finally realized that what I needed was to ditch the actual name and just make an acronym that fixed the problems. Changing the name that was shorter and more phonetic helped me a good bit. I finally worked out some words that fit: flexible, asynchronous, object data-exchange libraries (FAODEL). It's not great, but the words do relate to what we're doing. Once I convinced myself it was the right thing, it was easy to tell the team what I wanted and have the confidence to make it stick.


Mobile Antenna Testing

2017-09-24 rf planes

Not to sound like an obsessed nut, but I went out and bought a special antenna for capturing airplane data. Specifically, I bought the FlightAware 1090MHz ADS-B Antenna from Amazon. Heh. I didn't look too closely at the pictures and thought it would be walkie-talke antenna size. When a three foot long box arrived with a 26 inch antenna, I realized the coax connector in the picture was actually a large N coax connector instead of the tiny SMA connector I had in mind. I didn't have the right connectors, so I had to order a special cable to try the antenna. The initial tests of the antenna inside the house were good (saw a few flights beyond 100 Miles), but naturally I wondered how well it would do outside the house.


One of the nice things about working with the Pi is that I can just hook it up to a USB battery pack and take the thing wherever I want. This turned out to be great for testing the antenna. I attached the antenna to some PVC pipe, cabled it into the battery-powered Pi, and then duct taped the whole thing to the top of a ladder. While the whole thing was wobbly, I could pick up the ladder and drag it to different spots in the back yard. I'd then go to the Pi's web page from my tablet and watch the map to see how many planes I was getting.


I wasn't very rigorous about the tests, but it seemed like I got better performance when I moved the antenna from our patio to the back yard. The results seem logical because on the patio the house is still in the way of most of our air traffic (which is west and south). The downside of all this is that there isn't a good place to mount the antenna (or route its cable) on the back of the house. Plus my wireless network evaporates a few feet into the backyard. In any case, I'm going to put the antenna and the Pi in the garage for now, until I get more time to mount it properly up top.


Joing PiAware

2017-08-26 rf planes

It's been almost half a year since I got fed up with the Intel Edison and decided to switch over to the RaspberryPI. Most of that time I've just been tinkering with it, doing things like check out RetroPI, hooking up simple led circuits, and using the built-in tools to get the kids interested in programming. The main thing I've wanted to do is get dump1090 running, but I haven't been too motivated to do that since the setup I have on Edison just works. A few weeks ago I downloaded the PiAware disk image from FlightAware.com and got everything running on a Pi3 board (w/ RTL-SDR dongle). I registered my box with them and you can now go to their web page to see my statistics.

FlightAware Online Visualizations

FA has several interesting visualizations, including the above position histogram to help you figure out where your traffic is. From the above I can see that most of the planes I catch are within 50 miles from me, and that planes that are farther out usually are North-West of me. The plot also shows I'm still getting some oddball points more than 250 miles away. The last time I looked at these points I found that they must be corrupt values (usually one bad lon or lat).


One of the cool things about using FA is that I can compare my stats to other people that are close to me. Right now there are 11 other people that are within 10 miles of me. If I make an antenna change I can look at my neighbors to see if I'm catching the same planes as them. Just eyeballing the data it looks like my antenna (which is directional and pointed at Sutro Tower in SF) is missing all kinds of flights to the south of me. Also, one or two of these people are getting nearly double the flights I am.

MLAT: Using the Crowd to Find the Unfindable

The coolest feature of using FA though is that your PiAware box can work with other PiAware boxes in the area to estimate the positions of planes that aren't transmitting their coordinates. A pet peeve of mine is that many planes have ADS-B hardware, but they don't transmit their locations because they want to have some privacy (never mind that they're buzzing over my house, peering into my backyard). PiAware has a mulilateration (MLAT) capability you can enable to find planes based on time differance of arrival (TDoA) information from four receivers. Basically, if four PiAware receivers hear the same message, they can use the position info for the receivers and the delay each of them reports for receiving the message to triangulate the plane. While it means you have to register your receiver's position with FlightAware and spend some network bandwidth transmitting data, many of those unknown planes get tagged.


Above is a snapshot of some of what the dump1090 webpage looks like now with mlat on. All of the tan (?) entries are planes that were tagged with mlat. It's very satisfying to see the added entries. Previously, it seemed like half of the planes were annonymous.

PiAware Setup

The PiAware setup was pretty straightforward. I just downloaded the image and wrote it out to a microsd card, and then updated settings at boot. They consolidated configuration options (eg, wireless network settings, receiver settings, etc) into a file called /boot/piaware-config.txt. It's a little odd to put config options in /boot, but it works ok since this is an appliance. I checked and the system will automatically try to reconnect to the network if it isn't available. That'll be handy when I want to run this fulltime, but shutdown my network link at night. I need to port my ADS-B logging scripts over to this platform (and update them to pull the mlat data), but that's a job for later.


Changes in Antenna Range over Time

2017-03-11 rf planes

After figuring out how to find a convex hull for my airplane data, I went back to see how much the range has varied over time. Here's a timeline that shows the daily observations on top, and the area of the daily convex hull on the bottom:


Antenna Changes

I've had a few changes to my RTL setup over the last few years that have had an impact on range. I've had basically three different configurations in the last two years: (1) an initial version connecting a NooElec RTL-SDR straight to the antenna, (2) an updated version that added a FlightAware filter between the NooElec and the antenna, and (3) a new version that uses a FlightAware Pro USB stick that has the filter built into it. Here's how big the convex hulls were for the data points collected in each day for the three settings:


In general, the filtering does seem to boost my range a good bit. The FlightAware Pro USB stick also seems to do better than a generic tuner that's plugged into an external filter. To be fair though- there were other things that changed in my setup over the years that may be skewing some of these numbers. At some point I inserted a splitter into my setup so I could also route the antenna to a USB DVB tuner. I think that explains the drop you see in the middle of the second setup (ignore the two gap periods- those were because the recorder wasn't running full days).

Future Work

Now that my setup is a little more setup, I'd like to capture some data and then see how well my range corresponds with things like the weather (sounds like a fun science project for the kids).


Antenna Range for Watching Planes

2017-03-04 rf planes

One of the things I've been doing lately with the airplane data I'm collection with my RTL SDR is come up with better estimates for how much coverage and range I get in my local area. While you can easily estimate range just by watching dump1090's map output, you really have to look at a whole day's worth of data or more to get a good idea of what you can see. It's also useful to come up with some numerical estimates of the coverage so you can compare one hardware setup to another. After dusting off geometric algorithms, I realized the right thing to do was clean the data a bit and then compute the area of a convex hull for the points. The SciPy spatial library for Python has a ConvexHull that makes this easy to do.


Computing the Convex Hull

My airplane data is just a collection of geospatial points collected each day, so a good way to quantify my range would be to find a bounding shape for the points and compute the area. A bounding rectangle would be quick and easy, but inaccurate. I started thinking about algorithms to rotate a rectangle for a better fit, or recursively remove sections of it until it gave better coverage. Realizing that I was reinventing the wheel, I searched around for known solutions. A s/o answer pointed me to the gift wrapping algorithm, which builds a bounding polygon by comparing all remaining points and selecting the one that would maximize the new interior angle. It looked like it'd be fun to code up, but expensive to compute.

I then stumbled onto a youtube video that explained how QuickHull selected the four (left,right,up,down)-most points to build two triangles, removed all the interior points, and then selected points that would add the largest triangle to the polygon (or remove the most points). I was looking forward to coding this up, but once again, I wondered what was already available. Naturally, you can already do this in python: SciPy's Spatial library has ConvexHull. It's as easy as:

from scipy.spatial import ConvexHull
import matplotlib.pyplot as plt

# Do something to get some 2D points

# Let Convex hull do the work
cv = ConvexHull(pts)

#cv.vertices has the bounding polygon

print("%f\n" % cv.area) # Get some stats

Sigh.. Python's great, but it sure does take the fun out of everything.

Generating Plots

There was stil a good bit of other cleanup and map stuff I had to do to start making reasonable pictures. First, I had to throw out some bad data. I still get a lot of bad points, most likely because of transmission errors (or misconfigured transmitters?). I filtered everything out that's outside of a reasonable range (though this was tighter than it needed to be). The number of bad points in any day was 1.2% of the observed points (50% of the days had less than 0.01% bad data). For plotting I had to load in Basemap and do all the normal merc projection stuff. For something so fundamental, this always seems to take a lot longer than it should. In any case, the pictures look good when you finish.