In anticipation of yet another tough winter, I began recording the real-time data into a PostgreSQL database on a personal server, starting in mid-November. With over a million rows stored, representing nearly three months of data, I have started to do some preliminary analysis. One important caveat: while looking through the database, I noticed that there are some gaps. Certain trips do not have the expected amount of data recorded. The reason for this is unknown, but I speculate it may have to do with the GPS device onboard certain trains being broken or misconfigured.
First I looked at an overall measure of lateness, taken by looking at all trips and measuring the difference between scheduled arrival time at their terminal destination, compared to actual arrival time. All times are represented as "hh:mm:ss".
|Average Lateness||Maximum Lateness|
Then I broke it down by line:
|Line Name||Average Lateness||Maximum Lateness|
And also by day (using a Zoomable Line Chart showing average and maximum lateness measured in seconds):
The previous three tables look at lateness of a trip by comparing the final scheduled station stop against its actual time. But what about all the other stations? It turns out that trains can often make up time enroute, so that they may arrive on-time or early to their terminal but were late to intermediate stations. The following two maps look at average lateness, according to the real-time predictions of every train, of each station stop on the way. I have split the data into two parts: the first map displays inbound trips, and the second map displays outbound trips. You can view the average lateness in seconds by clicking on a station marker. The color codes are divided into four buckets: green dots average under 1 minute late, yellow dots average under 3 minutes late, purple dots average under 5 minutes late, and red dots average over 5 minutes late.
Average lateness of inbound trips
Average lateness of outbound trips
These charts were created with the help of the Google Fusion (beta) tool. I uploaded my data with a KML field containing the geographic points and lines, and Fusion was able to "geocode" that directly onto a map.
What stands out? I see that the Rockport line is pretty late on average. It seems that trains that arrive in Worcester often do so much earlier than scheduled, even though they may be late to previous stations. The bad average at Porter Square outbound is explained by some really horribly (1.5hr+) late trains in late January. The south side may be getting better on-time performance than the north side. Comparing the two maps, it would appear as if the Commuter Rail is better at bringing people into Boston than sending them out.
These numbers and charts are preliminary and highly unofficial. They are based upon the still-in-beta-testing real-time data feed of the Commuter Rail which should be expected to have gaps and possible mistakes in output.