Webcam Fetcher for Screensavers

2016-01-24 Sun
webcam

While cleaning up an old network-attached storage (NAS) box at home the other day, I found the remnants of an old, webcam-scraping project I built when we first moved out to California. California was an interesting place for us, but we often felt like we were out of sync with the world because all of our friends and family lived back on the East coast. One day while looking through the screensaver options on my desktop I started thinking about pictures I could plug into it to remind me of home. I then had an interesting idea: why not write a simple script to periodically download pictures from public webcams, and then route them into the screensaver? These webcams could provide a simple, passive portal by which we could keeps tabs on the places we used to know.

It didn't take long to write a simple Perl script that scraped a few webcams and save the images to my screensaver's picture directory. As I started adding more webcams I found I needed to do more sophisticated things with the script. After the project stabilized I migrated it over to a low-power NAS, which allowed the scraper to run at more reliable intervals. Eventually I retired the project, because the never-ending updates from around the world were just too distracting.

Building the Webcam Scraper

The first version of my webcam scraper used a simple Perl script to retrieve images from different webcams I'd found on the web. This work wasn't difficult- many of the webcams I looked at simply referenced their most recent picture with a static URL. All I had to do was store a list of URLs I wanted in a text file and then use the Perl script to download each URL and datestamp its image. It didn't take long before I found a few webcams that used URLs that changed over time. I updated my script so it would parse the html page the images lived in and retrieve the image. In most cases this could be done by simply extracting the n-th image url in the page. The system worked well and I built up a good list of webcams I could use.

The next trick was adjusting how frequently the scraper grabbed data. Webcams update at different intervals and are often offline (or boring) at night in the webcam's timezone. I added some timing interval info to my webcam list to control how frequently to grab (hourly, daily, weekly), as well as some hooks to set the hours of operation for the grabs. I also compared a grab to its previous result in order to do simple deduping. If a webcam returned too many duplicates in a row, or no data at all, the script marked it as inactive so I wouldn't pound their server over and over.

By this point the list of webcams was getting long and becoming difficult to manage from the command line. I rewrote the script to externalize all of the URL data and statistics into a SQLite database. This database enabled me to keep better statistics on each webcam, which in turn let me make more rational estimates about whether a camera was out of commission or not. The database also gave me an easy way to throw a simple GUI on top of it. All I had to do was write a Perl CGI script to take user input and feed it into the database. Editing the settings in a web page was a huge improvement over text files.

Running on a NAS

I originally ran the scraper on my desktop, but it didn't have very consistent results because I only powered up the desktop when I needed it. I happened to have a Buffalo Linkstation NAS, which I'd read ran Linux on an ARM processor. After reading a lot of webpages, I found that others had found a security vulnerability in the Linkstation's software, and had written a tool that let you exploit it to get a root account (!!). I ran the script, got root, and installed some missing binaries to make the box more usable. I had to cross compile some tools from source since the Linkstation uses an ARM, but the open NAS community did a good job of spelling out everything you needed to do.

I didn't have to do much to get the project running on the NAS. I had to install some Perl packages and SQLite libraries, but the scraper worked without any major changes. I created a cron job to run the scraper once an hour and had it dump data to a directory that was exported. The Linkstation already had a web server on it, so all I had to do to install my web-based editor was copy the CGI script into the existing webserver directory. The nice thing about running on the NAS was that the hardware only ran at about 7W, so I didn't feel bad about leaving it on for long periods of time. Before there was the cloud for launch-and-forget apps, I had the NAS box.

Routing to a Screen Saver

Getting pictures into a Linux screensaver was more annoying than I thought it would be. GNOME's default screensaver was very particular about where you were supposed to put pictures. It wasn't a big deal when I ran the scraper on my desktop, but when I started using the NAS, I didn't have a way to point GNOME at the NAS's mount point. The solution was to ditch GNOME's screensaver and start using XScreenSaver. GNOME didn't make it easy to switch, but XScreenSaver gave me all the options I needed for controlling where it should get pictures. The scraper stored each webcam's pictures in its own directory, and then merged the latest results into a separate directory that XScreenSaver could use.

Statistics

The plot below shows how many pictures I was grabbing a day. This data is interesting to me because it shows two things. First, the dots get more sparse as time goes on because I changed how often I powered up the NAS. Initially I had it running all the time, but then I started running it only when I needed it. Eventually I stopped using it altogether, except when I wanted to do backups. The second point from these plots is that you can see how webcams die off over time. I didn't update the webcam list very often (maybe once a year at best). Webcams are often short lived, unfortunately, Thus, the number of images dropped off over time as different cams died.

In terms of data, I grabbed from 60 different webcams and accumulated 107K pictures. The pictures amount ot about 4.5GB of data, which means the average picture was about 42KB. The majority of the data came in the first year of use.

Ending

The webcam project was fun but addictive. Any time the screensaver kicked on I found myself watching it like TV, wondering what the next webcam would look like. I worked on a few other features before I moved on. One was the ability to grab short video streams from MJPEG webcams. It wasn't hard to grab these video streams, but I didn't have a way to display them in the screensaver.

I also learned a good bit about security and privacy during this project. While searching for new sources, I found that a lot of people were installing webcams without realizing they were accessible from anywhere in the world. You can find tons of open IP webcams on Google through "inurl" searches. Most of these were traffic cams or convenience stores, but I also saw some home cameras that were in the open (e.g., pet cams for people to check up their pets during the day). This hits on a big problem with consumer security these days- how do you know your devices aren't doing something you'd be upset to hear about? Better home gateways would be a start. ISP analysis would be even better, assuming people didn't freak out about a third party is monitoring their connections.

Verifying C++ Compile-Time Hashing

2016-01-16 Sat
cpp code

The other day while writing some RPC code for a network library I started wondering if I could use c++11 features to manage my function handlers. Specifically, I was wondering if I could do some compile-time hashing to uniquely identify each RPC by integer value. Stack Overflow had a great post about how to do it, but my initial tests found the hashes didn't always happen at compile time the way I had naively hoped they would. Using nm and objdump I figured out how to make my specific use case work.

The Problem with RPC Names

One of the nitpicky things that bugs me when I write remote procedure calls (RPCs) for a C++ network library of ours is deciding on a way to uniquely label each RPC with an integer value. Some network libraries I've used just punt and tell you to pound define each identifier, which sooner or later results in a collision when you merge code from different people. Other libraries use string labels and figure out how to translate strings to integers at run time (possibly as part of connection handshaking). These are a pain as well, as there's extra work that has to happen at init time.

What I'd really like is to use string labels for RPCs, but have the labels all hash to integer values at compile time. I'd like the hash value to be a constant associated with an RPC class that doesn't need to be set every time you create a new instance of the RPC. The hash can be weak, because I can detect collisions at init time when each RPC is registered and tell users to come up with a different name (or hash).

Prior to C++11 (or maybe C++0x), I don't think it was possible to have the compiler hash a string at compile time. You could plug in macros to hash the name, but as far as I know, you'd get code that at best executed the hash at run time every time you wanted the hash.

Compile-time Hash on Stack Overflow

I'm not the only one that wanted to do compile-time hashes. This stack overflow question asked exactly what I'd been wondering- can C++11 be used to hash a string at compile time. Their answer was yes- under certain circumstances, you can craft a constexpr that did the work. Specifically, one of the examples implemented a simple hash by recursively stepping through an array:

//From stack overflow 2111667
unsigned constexpr const_hash(char const *input) {
  return *input ?         
           static_cast(*input) + 33 * 
                                     const_hash(input + 1) :
           5381;
}

People seemed to agree that this should work, but I wanted proof that the hashing code was being done at compile time (especially since the hash now uses a recursive function). I wrote some simple tests and used objdump to look at the assembly g++ 4.8.2 was generating.

First Failed Test

My first test just looked at what happened if you just plugged a string into the hash and printed it. For example:

int main(){
  unsigned x = const_hash("bozo"); 
  cout << hex << x << endl;
}  

The code prints out the hash 7c9c033f, so the hash is being done. However, nm shows that the hash function is still in there:

$ g++ -std=c++11 bozo1.cpp -o bozo1
$ ./bozo1
7c9c033f

$ nm -g -C bozo1 | grep hash
000000000040098e W const_hash(char const*)

$ nm -g  bozo1 | grep hash
000000000040098e W _Z10const_hashPKc

Looking at the assembly for main, you can see the hash function is getting executed at runtime:

00000000004007e0 
: 4007e0: 55 push %rbp 4007e1: 48 89 e5 mov %rsp,%rbp 4007e4: 48 83 ec 10 sub $0x10,%rsp 4007e8: bf 9b 0a 40 00 mov $0x400a9b,%edi 4007ed: e8 9c 01 00 00 callq 40098e <_Z10const_hashPKc>

Declaring the string as a const or constexpr didn't help.

Progress with a Switch Example

Since it didn't look like I could just naively use the hash function to get what I wanted, I started looking at the original use case the s/o poster was after. The poster asked about compile-time hashes because they wanted to use them for cases in a switch statement. It's a useful thing to do when you write state machines- often you'd like to give each state a printable label, but at the same time, you'd like to boil the labels down to hash values at compile time to speed things up. I updated my example to call to use the hashes in a switch:

int testit(string s){
  switch(const_hash(s.c_str())){
  case const_hash("bozo1"): return 1;
  case const_hash("bozo2"): return 2;
  default: 
    cout <<"Unknown" << endl;
  } 
} 
int main(){
  cout << hex << const_hash("bozo1") << endl;
  cout << hex << const_hash("bozo2") << endl;
  return testit("bozo1");
}

When running the program you see the bozo1 and bozo2 hashes are b886f27 and b889adf. Looking at the assembly for the testit function you see it calls the hash function on the input parameter (because the input could be anything), but each case operation's comparison has been hardwired to use the actual hash values that were generated at compile time:

0000000000400ae0 <_Z6testitSs>:
  400ae0:	55                   	push   %rbp
  400ae1:	48 89 e5             	mov    %rsp,%rbp
  400ae4:	48 83 ec 10          	sub    $0x10,%rsp
  400ae8:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  400aec:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  400af0:	48 89 c7             	mov    %rax,%rdi
  400af3:	e8 d8 fd ff ff       	callq  4008d0 <_ZNKSs5c_strEv@plt>
  400af8:	48 89 c7             	mov    %rax,%rdi
  400afb:	e8 8e 02 00 00       	callq  400d8e <_Z10const_hashPKc>
  400b00:	3d 27 6f 88 0b       	cmp    $0xb886f27,%eax
  400b05:	74 09                	je     400b10 <_Z6testitSs+0x30>
  400b07:	3d df 9a 88 0b       	cmp    $0xb889adf,%eax
  400b0c:	74 09                	je     400b17 <_Z6testitSs+0x37>
  ...

Use as an ID in an RPC Class

The final step for me was just to plug this trick into something that resembled the RPC code I wanted. In this code, I have a base class to define a generic RPC, and derived classes for each RPC. All I want is to have each one to be able to provide a static integer identifier back when asked, and not have anything hash at runtime.

class A {
public:
  virtual unsigned GetID() const = 0;
};

class B : public A {
public:
  unsigned GetID() const { return my_hash_id; }
  const static unsigned my_hash_id;
};
const unsigned B::my_hash_id = const_hash("MyThing");

int main(){
  A *b=new B(); 
  cout << hex << b->GetID()    << endl;
  cout << hex << B::my_hash_id << endl;
}

Grepping through nm's output, I saw that const_hash wasn't included in the executable. Looking at the assembly I see GetID is a function that returns the hash value and B::my_hash_id is hardwired to the hash value. This behavior is sufficient for what I need. I'll use the static my_hash_id when I want to register a new RPC during initialization (which will also check for hash collisions), and then use GetID() if I need to figure out what RPC I really am if I was only handed a base class pointer. It really is overkill for what I need, but it's nice to be able to reference the RPCs without having to generate unique integer IDs myself.

A Long-Running Flight Scraper on AWS

2015-04-12 Sun
planes gis code

A year ago I setup a long-running data scraper that would fetch flight info from the website FlightRadar24.com. I created a free Amazon VM that ran the scraper every six minutes for about eight months last year. In terms of raw data, it pulled about 200MB of data a day and collected a total of 56GB over the eight months. Now that I've retired the scraper, I thought I'd comment on how it worked.

In Search of Data

As I've written before, I started getting interested in geospatial tracks last year when I read about how different websites let you see where planes and ships are located in real time. Thinking that it'd be fun to get into geospatial data analysis, I started looking around for open datasets I could use for my own experiments. I quickly realized that there aren't that many datasets available, and that data owners like the FAA only provide their feeds to companies that have a legitimate business need for the data (and are willing to pay a subscription fee). Luckily, I stumbled upon a question posted on Stack Overflow where someone else wanted to know where they could get data. Buried in the replies was a comment from someone that noted that FlightRadar24.com aggregates crowd-sourced airline data and that you could get a current listing of plane locations in a JSON format just by querying a URL. I tried it out and was surprised to find a single wget operation returned a JSON file with the locations and stats of thousands of airplanes. The simplicity of it all was the kind of things scrapers dream about.

Cleaning the Data

I setup a simple cron job on my desktop at home to retrieve the JSON data every six minutes, and let it run from 7am-11pm over a long weekend. The fields weren't labeled in the data, so I had to do a lot of comparisons to figure out what everything meant. Fortunately, FR24's website has a nice graphical mode that lets you click on a plane and see its instantaneous stats. I grabbed some JSON data, picked out a specific flight to work with, and then compared the scraped data to the labeled fields the website gui was reporting. It was a little tricky since the stats on the website are continuously changing as the plane moves, but it was enough to identify each field in the JSON data array.

The next challenge was converting the instantaneous data into actual tracks (ie, instead of sorting by time, I wanted to sort by plane id). It was a perfect use case for Go: I wrote something that read in a day's worth of data, parsed the JSON data using an existing library, did the regrouping, and then dumped the output into a new format that would be easier for me to use (eg, one plane per line, with track points listed in a WKT linestring). The conversion reduces the 200MB of a day's data down to about 68MB of (ascii) track data. These track files are convenient for me because I can use command line tools (grep, awk, or python) to filter, group, and plot the things I want without having to do much thinking.

Running in Amazon for Free

Amazon sounded like the right place to run the scraper for longer runs. Since I hadn't signed up for AWS before, I found I qualified for their free usage tier, which lets you run a micro instance continuously for a year for free. The main limitation of the micro instance for me was storage- the instance only had about 5GB of storage, so I had to be careful about compressing the data and remembering to retrieve it off Amazon every few weeks. The latter wound up becoming a problem in November. I got tied up with Thanksgiving week and forgot to move data off the instance. The instance ran out of space and dropped data for a few weeks (during the busiest and most interesting time of year for US avionics, unfortunately). On the bright side I was able to get the previous data out of the system and restart it all without much trouble.

Below is the script I used in the instance to go fetch data. After I noticed the initial grabber was slipping by a few seconds every run, I added something to correct the sleep interval by the fetch delay.

#!/bin/bash 
MIN_SECONDS_DELAY=360
URL="http://www.flightradar24.com/zones/full_all.json"

while true; do
  oldsec=`date +%s`
  mytime=`date +%F/%F-%H%M`
  mydir=`date +%F`
  wget -q $URL

  if [ ! -d "$mydir" ]; then
       mkdir $mydir
  fi
  mv full_all.json $mytime.json
  bzip2 -9 $mytime.json

  # Find how long this took to grab, then subtract from sleep interval
  newsec=`date +%s`
  gap=$((newsec - oldsec))
  left=$((MIN_SECONDS_DELAY - gap))
  if [ "$left" -gt 0 ]; then
       sleep $left
  fi
done

Migrating Data Off Amazon

The next thing I wrote was a script to repack the data on the instance. It looked at a single day and converted the data from a series of bzip'd files to a bzip'd tar of uncompressed files. Repacking gave better compression for the download and was more convenient for later use. I ran this process by hand every few weeks. Below is the script I wrote. It was a big help using date's built-in day math- I wish I'd realized earlier that you can use it to walk through date ranges in bash.

#!/bin/bash

CURRENT=$(date +%Y-%m-%d --date="1 week ago") 
END=$(date +%Y-%m-%d) 

# Ask the user for the beginning date
read -p "Start Date [$CURRENT] " 
if [ "$REPLY" != "" ]; then
    CURRENT=$REPLY 
fi

# Loop over all days since then
while [ "$END" != "$CURRENT" ]; do
    echo $CURRENT
    if [[ ! -e "out/$CURRENT.tar" ]]; then
        echo "packaging"
        tar -cvf out/$CURRENT.tar $CURRENT
    else
        echo "skipping"
    fi
    CURRENT=$(date +%Y-%m-%d -d "$CURRENT +1 day")
done

Finally, I had to write something for my desktop to go out and pull the archives off the instance. Amazon gives you an ssh key file for logging into your instance, so all I had to do was just scp the files I needed.

#!/bin/bash

HOST=ec2-user@my-long-instance-name.compute.amazonaws.com
KEY=../my-aws-key.pem

# Set range to be from a week ago, stopping before today
CURRENT=$(date +%Y-%m-%d --date="1 week ago")
END=$(date +%Y-%m-%d)

read -p "Start Date [$CURRENT] "
if [ "$REPLY" != "" ]; then
    CURRENT=$REPLY
fi

# Grab each day
while [ "$END" != "$CURRENT" ]; do
    echo $CURRENT
    if [[ ! -d "$CURRENT" ]]; then
        echo "Downloading $CURRENT"
        scp -i $KEY \
            $HOST:data/out/$CURRENT.tar downloads/
        tar xf downloads/$CURRENT.tar
        bunzip2 $CURRENT/\*.bz2
        tar cjf bups/$CURRENT.tar.bz2 $CURRENT
    fi

    CURRENT=$(date +%Y-%m-%d -d "$CURRENT +1 day")
done

While my approach meant that I had to manually check on things periodically, it worked pretty well for what I needed. By compressing the data, I found I only had to check the instance about once a month.

End of the Scraper

I turned the scraper off back in February for two reasons: FR24 started shutting down its API and my free AWS instance was expiring soon. The FR24 shutdown was kind of interesting. January 20th the interface worked, but all the flight IDs became "F2424F" and one of the the fields said "UPDATE-YOUR-FR24-APP". February 3 or so, the API stopped working altogether. Given that FR24 still has a webapp, I'll bet you can still retrieve data from them somehow. However, I'm going to respect their interface and not dig into it.

Examining Bad Flight Data from the Logger

2015-03-08 Sun
tracks gis code planes

One of the problems of capturing your own ADS-B airplane data is that there are always bad values mixed in with the good. As I started looking through my data, I realized that every so often there'd be lon/lats that were nowhere near my location. Initially I thought I might be getting lucky with ionospheric reflections. However, a closer look into where these points are shows something else is probably going on here.

I wrote some Pylab to take all of my points and plot them on a world map (well, without the actual map). I marked of a rectangular to bound where Livermore is and then plotted each point I received. The points were colored blue if they were within the Livermore area and red if they were outside of it. I then extended the lon/lat boundaries for the Livermore area to help see where the far away points were relative to Livermore's lon/lat box.

The first thing I noticed was that there were a whole slew of dots horizontally that fall within Livermore's lat range. It's possible that this could be due to bit errors in the lon field. The next thing I noticed was that there were two columns of bad points, one at about -160 degrees, the other around 35. Since both of these columns had data spread across all lats, I realized it probably wasn't from an ionospheric reflection. The right column happens to be at about the same value as you'd get if lon and lat were swapped (drawn as a pink bar). However, I don't think that's what happened, as the dots are distributed all the way across the vertical column.

Individual Offenders

Since I didn't have a good explanation for the bad values, I did some more work on the dataset to pull out the individual offenders. Of the 2092 unique planes, only 16 were giving me problems. I plotted each plane's points individually below using the same plotter as before.

To me, these breakdowns indicate that the problem planes exhibit a few types of bad data. Three of them have purely horizontal data, while about 12 of the rest have some vertical problem. The 8678C0 case doesn't have enough points to tell what it's thinking. Interestingly, the vertical cases all seem to have at least a few points near Livermore. This makes me wonder if their GPS lost sync at some point in the flight and started reporting partially incorrect data. In any case there seem to be some common failure patterns.

Plane Info

Out of curiosity I went and looked up all 16 of these flights by hand to see what they were. It's interesting that all three of the planes with horizontal errors were't small planes (one old 747 and two new 777's). All the vertical errors seem to be from smaller planes (though one was a US Airways express). Here's the run down, including the number of days that I saw a particular plane in February:

#ID    Days Flight  Info
4248D9 1    VQ-BMS  Private 747 (1979) Las Vegas Sands Corp
8678C0 1    JA715A  Nippon Airways 777 
868EC0 1    JA779A  Nippon Airways 777 
A23C2E 4    N243LR  USAirways Express 
A3286B 1    N302TB  Private Beechcraft 400xp 
A346D0 1    N310    Private Gulfstream 
A40C4B 2    N360    Private (San Francisco) 
A5E921 7    N480FL  Private Beechcraft 
A7D68B 1    N604EM  Private Bombadeer
A7E28D 1    N607PH  Private Post Foods Inc
A8053D 1    N616CC  Private Gulfstream
A8DAB9 1    N67PW   Private Falcon50
AA7238 7    N772UA  United Airlines 777
AC6316 1    N898AK  Private Red Line Air
AC70DC 2    N900TG  Private (Foster City) 
AD853E 2    N970SJ  Private Gulfstream 

Code and Data

I've put my data and code up on GitHub for anyone that wants to look at it.

github:livermore-arplane-tracks

Flight Data From the Data Logger

2015-03-02 Mon
tracks gis

Now that I've been running the Edison airplane data logger for more than a month, it's time to start looking at the data it's been capturing. I pulled the logs off the sdcard, reorganized them into tracks, and then generated daily plots using Mapnik. The below image shows all of the flights the logger captured for each day in February.

The first thing to notice is that the SDR has a pretty good range, even with the stock antenna. I live just south east of the dot for Livermore and was only expecting to see planes near town. Instead I'm seeing traffic all over the Tri-Valley and some a little bit beyond. I was initially surprised to see anything in either the Bay area or the central valley because of the Pleasanton ridge and the Altamont hills. However, I realized it makes sense though- planes fly much higher than the hills, except when they're landing.

Logger Statistics

I wanted to know more about the data I was getting so I wrote a few scripts to extract some statistics. The first thing I wanted to know was what percentage of the time the logger was running each day. I made a decision not to run it all day when I started because there just aren't that many flights at night. In order to help me remember to start and stop the logger each day, I plugged the Edison into the same power strip my home router uses, which I usually turn on when I get up (7am) and turn off when I go to bed (11:30pm). I wrote a perl script to look through each day's log and find the largest gap of time where there was no data. Since the logger uses UTC, my nightly shutdowns usually appear as a 7 hour gap starting around 7am UTC. The top plot below shows what percentage of the day the logger was up and running. It looks like I was only late turning it on a few times in February.

The next thing I wanted to know was how many flights I was seeing a day. The raw numbers are in green above, but I've also scaled them up using the top chart's data to help normalize it (no, not a fair comparison, as night flights are fewer). The red lines on the plots indicate where Sundays began on these plots. It looks like there's definitely lighter activity on Sundays. Things are a little skewed though, since everything is in UTC instead of Pacific (I was lazy and didn't bother to redistribute the days).

Missing IDs

The logger looks for two types of ADS-B messages from dump1090. The first is an occasional ID message that associates the hex ID for a plane with its call sign (often a tail fin). The second is the current location for a particular plane (which only contains the hex ID). Grepping through the data, I see 2195 unique hex IDs for the position messages, but only 2092 unique hex IDs for the ID messages. I checked and both message streams have some unique values that do not appear in the other message stream.

What Airlines am I Seeing?

Another stat I was interested in is what airlines show up the most in my data. It isn't too hard to get a crude estimate of the breakdown because (most?) commercial airlines embed their ICAO code in their flight number. Through the power of awk, grep, sed, and uniq, I was able to pull out the number of different flights each provider had over my area (this is unique flight numbers, not total flights). Here are the top 20:

404 UAL  United Airlines
114 VRD  Virgin America
 84 FDX  Federal Express
 72 AAL  American Airlines
 51 DAL  Delta Airlines
 46 JBU  Jet Blue
 45 SKW  Sky West
 38 AWE  US Airways
 29 EJA  Airborne Netjets Aviation ExecJet
 26 UPS  United Parcel Service
 22 RCH  Airborne Air Mobility Command "Reach"
 18 CPA  Cathway Pacific Aircraft
 17 OPT  Options
 16 EJM  Executive Jet Management "Jet Speed"
 11 TWY  Sunset Aviation, Twilight
 11 HAL  Hawaiian Airlines
 11 CSN  China Southern Airlines
 10 KAL  Korean Air
 10 EVA  EVA (Chinese)
  7 AAR  Asiana Airlines

There are a few things of interest in that breakdown. First, freight airlines like FedEx and UPS show up pretty high in the list. I think people often overlook them, but they occupy a sizable chunk of what's in the air. Second, I didn't see anything from Southwest in the data. They definitely fly over us, so I was surprised that I didn't see any SW or WN fins. Finally, there were a ton of planes that didn't have any info associated with them that would help me ID the owner (e.g., there were 456 N fins). There are websites you can go to to look them up (most of the time it just gives a private owner), but it's something that sinks a lot of time. Maybe later I'll revisit and write something to automate the retrieval.