Post-Superbowl Flight Data

2016-03-13 Sun

planes code

Earlier this year the local Livermore paper had some articles about how air traffic at our municipal airport was going to shoot up during the Superbowl, because there weren't going to be enough places for private jets to land in the Bay Area. I didn't think much of it at the time, since the paper tends to have delusions about how rich people will fly to Livermore and spend time here. However, after the Superbowl, my wife noticed on social media that several of her local friends were chatting about how there were a lot of jets taking off from the airport that night. I fired up dump1090 and let it grab for an hour before bed. After a bit of post-processing work, here's a timeline for all the flights I saw:

Post Processing

Dump1090 is a great program- in addition to displaying where planes are in a webpage, it produces easy-to-parse dump files that contain a good bit of plane information. I captured two types of traces from dump1090 after the Superbowl: the detailed runtime output with all the message info and the distilled, csv-formatted data from the netcat interface. The grabs went on for about an hour and yielded 30MB and 13MB of data, respectively. Looking at the data, I saw basically what I expected: there were a large number of private planes, but none of them were reporting position information. It drives me crazy that they clearly have ADS-B equipment but don't transmit position. Current regulations don't require it though, so nearly all private planes turn it off to prevent you from tracking their exact locations.

The dataset did leave me with a big pile of timestamped ADS-B IDs though, so I started looking for ways I could convert the IDs to something more interesting. I found that the FAA provides an extremely useful database you can download that contains full registration information for all US planes. The database is a collection of easily-parsed CSV files, and contains each plane's ADS-B hex code, tailfin, plane type, and owner information. The master DB files are currently close to 200MB uncompressed, but when I extracted just the ADS-B id and owner columns, it was only about 8MB (small enough for a quick lookup table use).

I used the FAA info to find the owners of the planes in my dataset, and then did some simple text processing to assign a classification to each plane to group similar owners together. Since I only had 98 planes to look at, I mostly did the classifications by hand. 36 of the planes were easy to classify because they were owned by commercial, passenger airline companies like Delta. Another 16 planes were owned by banks (fun fact: banks own more planes than any other type of company). Through some Google searches, I identified four private passenger carriers (e.g., Xojet) that took care of 7 more planes. I found 2 more planes owned by oil companies (Eaton and San Joaquin Refining) and 1 emergency helicopter (California Shock Trauma). I also found a plane owned by a gun store and another by a trucking company. There were 21 other planes in the FAA dataset that didn't turn much up in Google searches, that I marked as unclassified. That left me with 15 planes that weren't in the FAA dataset.

Foreign Planes

The FAA dataset only has info on US planes, so I figured the missing planes must all be foreign owned. I did some reading and learned that the hex IDs reported in ADS-B are from the International Civil Aviation Organization (ICAO), and that each country is assigned its own block of values in the address space. For example the US fits in A00000 to AFFFFF (which explains why I always see A's in my data), while Portugal is in 490000 to 497FFF. Annoyingly, I couldn't find an official table with all the country codes in it anywhere. I did find a website that had deduced the info and put it into a table. I grabbed it and did a lot of awking to put it into a lookup table my scripts could use. Here's where the 15 remaining planes were from, sorted by country:

C00738  22:29:12.396 22:50:48.658 Canada
C00964  22:30:02.529 22:40:50.856 Canada
C04852  22:57:57.379 23:16:09.175 Canada
C06E87  22:29:06.825 22:38:57.286 Canada
C08048  22:31:58.590 22:43:20.798 Canada
780A5B  23:16:51.247 23:21:39.138 China
780A70  22:29:07.154 22:29:44.770 China
780DA9  22:44:36.948 22:52:14.966 China
0D049E  23:22:46.770 23:25:08.781 Mexico
0D083B  22:55:46.443 23:02:04.836 Mexico
0C206B  22:44:08.180 22:50:31.290 Panama
52027A  23:10:56.513 23:19:52.647 (reserved, EUR/NAT)
899103  22:34:37.444 22:42:35.777 Taiwan
072233  22:29:13.247 22:32:35.813 unknown
A22E75  22:42:30.730 22:55:33.598 United States

The last plane there is a US plane, which should have been in the FAA database. FlightAware gave me the tail fin (N24JG), which the FAA told me had a December renewal rate. My guess is that the plane was just in-between renewals. In any case it was an interesting plane because it's owned by Jeff Gordon, Inc. Jeff Gordon is a race car driver, so I guess I did spot a celebrity. Neat.

Military/Surveillance Flights

The next unknown was 072233. I didn't find this registered anywhere, but Google searches turned the number up in lists where people monitor military plane activity. They reported this as 09-72233, which they say is a US Army UH-72A or EC45 helicopter (unarmed).

The final plane was 52027A, which caught my eye because it falls into a NATO band of the ICAO numbers (I believe). I looked it up in the raw dump1090 data and found that it also used the callsign IRONS12, which sounds like a tough-guy military callsign. I was hoping it might be the F15 that intercepted four planes during the superbowl (and escorted them to Livermore), but I think it's actually a surveillance plane. I found references to an IRONS12 callsign being used by an RC-26B with serial 920372 in the Bay Area the week before the superbowl (and leaving after). The RC-26B appears to be an Air National Guard plane with sensors for filming and tracking, and serves to "bridge the gap between Department of Defense and civil authorities". Now that I think about it, a surveillance plane is a lot more interesting than the F15s that the news covered.

Flight Times

The only other analysis I did on this data was look at how long planes were in the air (or otherwise chirping their ADS-B info). Given my antenna configuration that night, most planes were only visible for about 10-15 minutes. The emergency helicopter though operated for more than 20 minutes. I'd been hoping to see some private planes with long running times (a sign that they were sitting at the airport waiting for their owners to show up), but that didn't happen.

Code

I've put the data and the scripts used to do these plots on github. The country code lookup table I made for this work is also in the repo.

github:flight-classifier

Webcam Timelapses

2016-02-07 Sun

webcam

For fun, I went through some of the images I captured from my webcam scraper project and converted them into timelapse videos. The videos are all pretty repetitive. However, the videos did help me spot some nice one-offs, such as the Vancouver lightning strike I reported on previously. Here are some Youtube clips of the more interesting ones.

Rising Tides

The tides can be interested to watch in timelapses. Check out the rise and fall of boats in these webcams from Alaska and Hilton Head:

France

I spent a summer in Metz, France during college, so it was nice to see pictures of the city and Paris come up in the screensaver. One of the Metz webcams tracked an interesting building with curvy architecture being built. The Eiffel Tower cam is fairly constant, but if you stop it around Bastille day you can see some fireworks.

Around Europe

There were a few other places I grabbed from around Europe. The problem I had with getting data from there was that with the timezone differences, it was often night there when my desktop was running, resulting in night images. Here are timelapses from Warsaw, the Vatican, and Switzerland.

California Mountains

California also has some good cameras out in the mountains. Here are two from Mount Wilson (at the Mt. Wilson Observatory near L.A.) and Mount Shasta.

The Bay Area

The bay area has a few good cameras, besides the normal traffic cams. Here are cameras from downtown, Sausalito, and Berkeley.

Atlanta

I only found a few webcams for Atlanta, but the skyline camera always looked good to me. Georgia State used to have a really good, user-controlled camera with a strong zoom that let you look around the downtown streets and buildings. It was interesting to watch how other people controlled the camera. I often thought that if I watched it long enough I'd witness some downtown crime.

Antarctica

I was surprised to find that there are multiple webcams in Antarctica, and that they are well maintained. I don't seem to have it anymore, but on mother's day, someone left a sign in front of the webcam that said "I love you mom". The below timelapse is a little boring, but midway through it you see sea lions and penguins (I think) come up on shore.

Vancouver

Finally, here's my favorite camera, the one pointed at the Burrard Bridge in Vancouver. If you look around 34 seconds into it, you'll see the lightning strike I mentioned in the previous post.

Webcam Picks

2016-01-30 Sat

webcam

I had a chance to looked through my webcam dataset a little more and pick out a few interesting moments. First, my favorite site was a long-running webcam in Vancouver called KatKam, which shows the Burrard Bridge and the English Bay. I've never been there before, but the colors of the water, sky, and bridge often looked pleasant. While flipping through its pictures, I noticed that one night the webcam caught a lightning strike in the distance:

Another favorite webcam for me is at the Mount Wilson Observatory, near LA. I visited the observatory one time when I lived in Pasadena and thought the massive 100-inch Hooker Telescope was really amazing. Their webcam captures a few different views of the surrounding area, including the mountains, the observatories, and LA. However, back in September of 2009 it monitored a major forest fire as it marched over the hills. It must have been terrifying to watch it move from crest to crest, getting closer every day:

The Eiffel Tower was also another good one. There were at least three different cameras pointed at it, so if a supervillain ever did make a ray gun to steal it, it would be well documented. The Eiffel Tower was also one of the few places in Europe that I continued grabbing after dark, as they often had interesting light displays on it and around it during the holidays. Bastille day was a good day to check it out- I even caught some fireworks around it in 2009:

Webcam Fetcher for Screensavers

2016-01-24 Sun

webcam

While cleaning up an old network-attached storage (NAS) box at home the other day, I found the remnants of an old, webcam-scraping project I built when we first moved out to California. California was an interesting place for us, but we often felt like we were out of sync with the world because all of our friends and family lived back on the East coast. One day while looking through the screensaver options on my desktop I started thinking about pictures I could plug into it to remind me of home. I then had an interesting idea: why not write a simple script to periodically download pictures from public webcams, and then route them into the screensaver? These webcams could provide a simple, passive portal by which we could keeps tabs on the places we used to know.

It didn't take long to write a simple Perl script that scraped a few webcams and save the images to my screensaver's picture directory. As I started adding more webcams I found I needed to do more sophisticated things with the script. After the project stabilized I migrated it over to a low-power NAS, which allowed the scraper to run at more reliable intervals. Eventually I retired the project, because the never-ending updates from around the world were just too distracting.

Building the Webcam Scraper

The first version of my webcam scraper used a simple Perl script to retrieve images from different webcams I'd found on the web. This work wasn't difficult- many of the webcams I looked at simply referenced their most recent picture with a static URL. All I had to do was store a list of URLs I wanted in a text file and then use the Perl script to download each URL and datestamp its image. It didn't take long before I found a few webcams that used URLs that changed over time. I updated my script so it would parse the html page the images lived in and retrieve the image. In most cases this could be done by simply extracting the n-th image url in the page. The system worked well and I built up a good list of webcams I could use.

The next trick was adjusting how frequently the scraper grabbed data. Webcams update at different intervals and are often offline (or boring) at night in the webcam's timezone. I added some timing interval info to my webcam list to control how frequently to grab (hourly, daily, weekly), as well as some hooks to set the hours of operation for the grabs. I also compared a grab to its previous result in order to do simple deduping. If a webcam returned too many duplicates in a row, or no data at all, the script marked it as inactive so I wouldn't pound their server over and over.

By this point the list of webcams was getting long and becoming difficult to manage from the command line. I rewrote the script to externalize all of the URL data and statistics into a SQLite database. This database enabled me to keep better statistics on each webcam, which in turn let me make more rational estimates about whether a camera was out of commission or not. The database also gave me an easy way to throw a simple GUI on top of it. All I had to do was write a Perl CGI script to take user input and feed it into the database. Editing the settings in a web page was a huge improvement over text files.

Running on a NAS

I originally ran the scraper on my desktop, but it didn't have very consistent results because I only powered up the desktop when I needed it. I happened to have a Buffalo Linkstation NAS, which I'd read ran Linux on an ARM processor. After reading a lot of webpages, I found that others had found a security vulnerability in the Linkstation's software, and had written a tool that let you exploit it to get a root account (!!). I ran the script, got root, and installed some missing binaries to make the box more usable. I had to cross compile some tools from source since the Linkstation uses an ARM, but the open NAS community did a good job of spelling out everything you needed to do.

I didn't have to do much to get the project running on the NAS. I had to install some Perl packages and SQLite libraries, but the scraper worked without any major changes. I created a cron job to run the scraper once an hour and had it dump data to a directory that was exported. The Linkstation already had a web server on it, so all I had to do to install my web-based editor was copy the CGI script into the existing webserver directory. The nice thing about running on the NAS was that the hardware only ran at about 7W, so I didn't feel bad about leaving it on for long periods of time. Before there was the cloud for launch-and-forget apps, I had the NAS box.

Routing to a Screen Saver

Getting pictures into a Linux screensaver was more annoying than I thought it would be. GNOME's default screensaver was very particular about where you were supposed to put pictures. It wasn't a big deal when I ran the scraper on my desktop, but when I started using the NAS, I didn't have a way to point GNOME at the NAS's mount point. The solution was to ditch GNOME's screensaver and start using XScreenSaver. GNOME didn't make it easy to switch, but XScreenSaver gave me all the options I needed for controlling where it should get pictures. The scraper stored each webcam's pictures in its own directory, and then merged the latest results into a separate directory that XScreenSaver could use.

Statistics

The plot below shows how many pictures I was grabbing a day. This data is interesting to me because it shows two things. First, the dots get more sparse as time goes on because I changed how often I powered up the NAS. Initially I had it running all the time, but then I started running it only when I needed it. Eventually I stopped using it altogether, except when I wanted to do backups. The second point from these plots is that you can see how webcams die off over time. I didn't update the webcam list very often (maybe once a year at best). Webcams are often short lived, unfortunately, Thus, the number of images dropped off over time as different cams died.

In terms of data, I grabbed from 60 different webcams and accumulated 107K pictures. The pictures amount ot about 4.5GB of data, which means the average picture was about 42KB. The majority of the data came in the first year of use.

Ending

The webcam project was fun but addictive. Any time the screensaver kicked on I found myself watching it like TV, wondering what the next webcam would look like. I worked on a few other features before I moved on. One was the ability to grab short video streams from MJPEG webcams. It wasn't hard to grab these video streams, but I didn't have a way to display them in the screensaver.

I also learned a good bit about security and privacy during this project. While searching for new sources, I found that a lot of people were installing webcams without realizing they were accessible from anywhere in the world. You can find tons of open IP webcams on Google through "inurl" searches. Most of these were traffic cams or convenience stores, but I also saw some home cameras that were in the open (e.g., pet cams for people to check up their pets during the day). This hits on a big problem with consumer security these days- how do you know your devices aren't doing something you'd be upset to hear about? Better home gateways would be a start. ISP analysis would be even better, assuming people didn't freak out about a third party is monitoring their connections.

Verifying C++ Compile-Time Hashing

2016-01-16 Sat

cpp code

The other day while writing some RPC code for a network library I started wondering if I could use c++11 features to manage my function handlers. Specifically, I was wondering if I could do some compile-time hashing to uniquely identify each RPC by integer value. Stack Overflow had a great post about how to do it, but my initial tests found the hashes didn't always happen at compile time the way I had naively hoped they would. Using nm and objdump I figured out how to make my specific use case work.

The Problem with RPC Names

One of the nitpicky things that bugs me when I write remote procedure calls (RPCs) for a C++ network library of ours is deciding on a way to uniquely label each RPC with an integer value. Some network libraries I've used just punt and tell you to pound define each identifier, which sooner or later results in a collision when you merge code from different people. Other libraries use string labels and figure out how to translate strings to integers at run time (possibly as part of connection handshaking). These are a pain as well, as there's extra work that has to happen at init time.

What I'd really like is to use string labels for RPCs, but have the labels all hash to integer values at compile time. I'd like the hash value to be a constant associated with an RPC class that doesn't need to be set every time you create a new instance of the RPC. The hash can be weak, because I can detect collisions at init time when each RPC is registered and tell users to come up with a different name (or hash).

Prior to C++11 (or maybe C++0x), I don't think it was possible to have the compiler hash a string at compile time. You could plug in macros to hash the name, but as far as I know, you'd get code that at best executed the hash at run time every time you wanted the hash.

Compile-time Hash on Stack Overflow

I'm not the only one that wanted to do compile-time hashes. This stack overflow question asked exactly what I'd been wondering- can C++11 be used to hash a string at compile time. Their answer was yes- under certain circumstances, you can craft a constexpr that did the work. Specifically, one of the examples implemented a simple hash by recursively stepping through an array:

//From stack overflow 2111667
unsigned constexpr const_hash(char const *input) {
  return *input ?         
           static_cast(*input) + 33 * 
                                     const_hash(input + 1) :
           5381;
}

People seemed to agree that this should work, but I wanted proof that the hashing code was being done at compile time (especially since the hash now uses a recursive function). I wrote some simple tests and used objdump to look at the assembly g++ 4.8.2 was generating.

First Failed Test

My first test just looked at what happened if you just plugged a string into the hash and printed it. For example:

int main(){
  unsigned x = const_hash("bozo"); 
  cout << hex << x << endl;
}

The code prints out the hash 7c9c033f, so the hash is being done. However, nm shows that the hash function is still in there:

$ g++ -std=c++11 bozo1.cpp -o bozo1
$ ./bozo1
7c9c033f

$ nm -g -C bozo1 | grep hash
000000000040098e W const_hash(char const*)

$ nm -g  bozo1 | grep hash
000000000040098e W _Z10const_hashPKc

Looking at the assembly for main, you can see the hash function is getting executed at runtime:

00000000004007e0 :
  4007e0:	55                   	push   %rbp
  4007e1:	48 89 e5             	mov    %rsp,%rbp
  4007e4:	48 83 ec 10          	sub    $0x10,%rsp
  4007e8:	bf 9b 0a 40 00       	mov    $0x400a9b,%edi
  4007ed:	e8 9c 01 00 00       	callq  40098e <_Z10const_hashPKc>

Declaring the string as a const or constexpr didn't help.

Progress with a Switch Example

Since it didn't look like I could just naively use the hash function to get what I wanted, I started looking at the original use case the s/o poster was after. The poster asked about compile-time hashes because they wanted to use them for cases in a switch statement. It's a useful thing to do when you write state machines- often you'd like to give each state a printable label, but at the same time, you'd like to boil the labels down to hash values at compile time to speed things up. I updated my example to call to use the hashes in a switch:

int testit(string s){
  switch(const_hash(s.c_str())){
  case const_hash("bozo1"): return 1;
  case const_hash("bozo2"): return 2;
  default: 
    cout <<"Unknown" << endl;
  } 
} 
int main(){
  cout << hex << const_hash("bozo1") << endl;
  cout << hex << const_hash("bozo2") << endl;
  return testit("bozo1");
}

When running the program you see the bozo1 and bozo2 hashes are b886f27 and b889adf. Looking at the assembly for the testit function you see it calls the hash function on the input parameter (because the input could be anything), but each case operation's comparison has been hardwired to use the actual hash values that were generated at compile time:

0000000000400ae0 <_Z6testitSs>:
  400ae0:	55                   	push   %rbp
  400ae1:	48 89 e5             	mov    %rsp,%rbp
  400ae4:	48 83 ec 10          	sub    $0x10,%rsp
  400ae8:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  400aec:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  400af0:	48 89 c7             	mov    %rax,%rdi
  400af3:	e8 d8 fd ff ff       	callq  4008d0 <_ZNKSs5c_strEv@plt>
  400af8:	48 89 c7             	mov    %rax,%rdi
  400afb:	e8 8e 02 00 00       	callq  400d8e <_Z10const_hashPKc>
  400b00:	3d 27 6f 88 0b       	cmp    $0xb886f27,%eax
  400b05:	74 09                	je     400b10 <_Z6testitSs+0x30>
  400b07:	3d df 9a 88 0b       	cmp    $0xb889adf,%eax
  400b0c:	74 09                	je     400b17 <_Z6testitSs+0x37>
  ...

Use as an ID in an RPC Class

The final step for me was just to plug this trick into something that resembled the RPC code I wanted. In this code, I have a base class to define a generic RPC, and derived classes for each RPC. All I want is to have each one to be able to provide a static integer identifier back when asked, and not have anything hash at runtime.

class A {
public:
  virtual unsigned GetID() const = 0;
};

class B : public A {
public:
  unsigned GetID() const { return my_hash_id; }
  const static unsigned my_hash_id;
};
const unsigned B::my_hash_id = const_hash("MyThing");

int main(){
  A *b=new B(); 
  cout << hex << b->GetID()    << endl;
  cout << hex << B::my_hash_id << endl;
}

Grepping through nm's output, I saw that const_hash wasn't included in the executable. Looking at the assembly I see GetID is a function that returns the hash value and B::my_hash_id is hardwired to the hash value. This behavior is sufficient for what I need. I'll use the static my_hash_id when I want to register a new RPC during initialization (which will also check for hash collisions), and then use GetID() if I need to figure out what RPC I really am if I was only handed a base class pointer. It really is overkill for what I need, but it's nice to be able to reference the RPCs without having to generate unique integer IDs myself.