Brilliant Boris Bikes Animation

Some of us at CASA can’t get enough of the Barclay’s Cycle Hire data. We have had Ollie‘s hugely successful flow maps, journey time heat maps, and now the the Sociable Physicist himself, Martin Austwick has created this stunning animation of the bikes.

The TFL data release contained the start point, end point, and duration for around 1.4 million bike journeys. An educated guess has been made about routes between stations using OpenStreetMap data and some routing software. The animation shows the scheme’s busiest day (thanks to a tube strike) and provides an amazing insight into the dynamics of Boris Bike users. You can find more info here.
I suspect this animation will be another big PR win for TFL, it is just a shame that it took a freedom of information request to get the underlying data.
Martin’s viz is one of my favourites but there have been a couple of others released that use similar technologies to show urban transport systems. Chris McDowall has produced an animation on a much larger scale by showing Auckland’s public transport system on a typical Monday.
But you shouldn’t abuse like any other drugs of such kind. The doctor recommended me to take it only when it is absolutely necessary and only in small doses.

Another great animation was produced by fellow CASA researcher Anil Bawa-Cavia. This shows London’s bus network and it makes for a great comparison to Auckland’s transport system above.


The National Geographic Surname Map has generated a lot of discussion both online and via email. The response has been overwhelmingly positive but some people, unsurprisingly, have suggested improvements. A recent post on the great Junk Charts blog acts as a good summary of the comments I have received. For the purpose of this post I have left out the positives in order that I can address some of the suggested limitations of the map. There is always room for improvement but I thought it would be good to outline some of the logic behind relaxing a couple of Tufte’s classic rules on data visualisation. I have pasted each suggested improvements from Junk Charts below and added my responses beneath.
“They really ought to have used relative popularity rather than absolute popularity. This is another area of improvement for all word clouds. Today, word clouds plot the number of times a specific word appears in a piece of text. We often try to compare several word clouds against each other; and when we do that, the only sensible measure is the proportion (relative frequency) of time a specific word appear. Say, one compares Obama and McCain speeches by comparing two word clouds. If these two speeches differ significantly in length, then comparing the number of times each candidate use “education” words is silly — we have to compare the number of times per length of the speech.”
The use of relative popularity is something I would agree with in most circumstances. The surname map, however, is designed to give a national impression (rather than state by state) impression of the general distribution of surnames. Had we used a relative measure (such as freq. per million) where would the million come from, the state or the entire US population? If it were the former we would compound the second criticism below. If we wanted a comparison (such as changes over time) we would, of course, have used relative frequencies.
“The cutoff of top 25 names in each state suffers a similar problem. The 26th most popular name in California, a populous state, is of more interest than say the 15th most popular name in Montana (or insert your favorite small state). Instead, a more sensible cutoff would be including names that account for at least 2 percent (say) of a state’s population. By doing this, the more populated states would have more entries than the less populated states.”
As another commenter remarked, the long-tailed nature of the surname distribution would mean there is very little difference between the popularity rank and an equally arbitrary cutoff percentage. I also don’t understand why more populated states would have more surnames at the top of their distribution. It is not necessarily the case that population size correlates with surname frequency.

“Given the above bullets, it is not surprising that the word-size scale has serious problems. Because it is an absolute number and not relative to each state’s population, the big words can only show up in populous states. In other words, the size of the words tells us about the geographical distribution of the U.S. population. As I mentioned before (such as here), this insight is available on pretty much every map used to plot data that has ever been produced. The one thing that all these maps never fail to tell us is the fact that most of the U.S. population is bi-coastal. Unfortunately, the real message of the map — in this case, the geography of surnames — is subsumed.”
The message of the map is that surnames are not randomly distributed across the US. Each wave of migrants moving to the US has a clear preference (or necessity) to where they live(d) and this has creates the diverse patchwork of surnames shown in the map. I cannot see how this message has been subsumed by not standardising for population. If this was a map of car theft then it would be nonsense to not account for population density (or car density) but in the context of surnames (due to the nature of their distribution throughout the population) the patterns (and message) would have been similar.
“And then, the map invents false data. Notice that there are 1,250 geographic sites on the map (25 names times 50 states). This is a visually prominent feature of the map, and yet there is no rhyme or reason as to where the names are placed, with the exception of respecting state boundaries. The casual reader may think that the appearance of the Chinese name “Lee” in the inner, central part of California implies that Lee-named Chinese-Americans aggregate in those parts of California. Far from the truth!”
I suffer from panic attacks and agoraphobia. It enhances the action Xanax Generic of antidepressants.
This is the biggest limitation of the map- and one I had tried to address in the London Surnames map. We were constrained by the fact that the map was being designed for print. Had it been designed as an interactive map (and not simply a static image) we would not have gone about it this way.

As with all visualisations you can’t please everyone, but I hope I have provided some insight into why the map developed the way that it did.

2010 Muslim Populations By Country

Using the release of the “Muslim Populations By Country” dataset from the Guardian Datastore I have produced a cartogram to visualise the data. The size of the country represents its 2010 Muslim population and the colours indicate how much the population is expected to grow/shrink over the next 20 years. It may not be interactive but it is a really compelling way to show the data. It would be good to get hold of data for other religions. For more cartogram fun check out Worldmapper.


My Week in Maps

This week has been a busy one with the “publication” of a couple of maps I have been involved with alongside the circulation of a few cartographic gems. I thought I would share my mapping highlights.

To have something published in the National Geographic is a great honour. The map of US Surnames has proved hugely popular and was a great project to work on. A real high point in my PhD research so far.

The popularity of a London version of the US Surname Map outstripped all expectations with 10s of thousands of visitors. Cartographically less impressive than the US map but much more detailed, I think the main thing people are most surprised (and perhaps disappointed about) is just how many “Smiths” there are!

I’ve not quite worked out if this map shows anything surprising but I really like the cartography so “Profane Mountains, Polite Plains” gets a shout out here. It shows the frequency of swearwords in people’s Tweets across the US.

This map of scientific collaborations (detailed here) demonstrates nicely the strong academic ties between some countries over others. I think its a great map which I hope (although I can’t seem to confirm) was created with R. The map was actually created using MySQL, Java and Photoshop (thanks @beyondmaps).

Mapping London's Surnames

Inspired by the What’s in a Surname? map we helped make with the National Geographic, I have created 15 interactive typographic maps to show the most popular surnames across London. What they lack in cartographic brilliance, I hope they make up for in detail. There are 983 geographic units (Middle Super Output Areas) in each map and across all 15 there are 2379 individual surnames (15,000 surname labels in total). The font size for each surname label has been scaled to give an idea of the number of people who have that surname in each place. The surname frequencies come from the 2001 Electoral Roll and won’t contain everyone living in London but it is one of the best datasets available.

London is renowned for being a diverse city but this is barely reflected in the most prevalent surnames- only a few name origins can be discerned from the map. You have to look a little further down the surname rankings for this diversity to become apparent. The surnames shown on all 15 maps can be traced back to one of 38 origins; I have selected unique colours for 10 of the most popular. Surname origins were established using the Onomap classification tool. We are mapping the origins of the surnames, which are not necessarily the same as the origins of the people possessing them. Many people in London have adopted Anglicised surnames.

It is also clear from the maps that the same sorts of surnames tend to cluster together. This is because they often closely reflect the naming preferences of particular groups of people within an area. As you transition through to the less popular surnames things become a little more jumbled and the distinct patterns present in the first map become less distinct.

The final thing that stands out is how surname popularity decreases between the first and second most popular names and every subsequent change after that. You can see this by how quickly the text size reduces until almost all names are written in the smallest font sizes.
The more you study these maps the more interesting, and perhaps complex, they become. My final thoughts therefore appear a little contradictory. The first is that a surprising number of Londoners share the same name (especially with their immediate neighbours). The second is that despite the dominance of relatively few surnames at the top of the rankings, the further down the rankings you get the more you see of London’s population diversity. We are of course only mapping the top 15 surnames in each area of London- there are many thousands more. If you can’t find your surname on these maps, you can see where it is around the world here.There is no doubt that is an excellent drug, with a strong and long-lasting effect. It helps me and my friends to cope with the problem of the severe erection.
The maps were created as part of my ongoing PhD research using the Worldnames Database compiled by University College London’s Department of Geography. Thanks to Oliver O’Brien from CASA for putting the maps online. A high resolution print version of the map (previewed below) is available on request.

What's in a Surname? (AKA United States of Surnames!)

Map: Mina Liu, Oliver Uberti. Source: James Cheshire, Paul Longley, Pablo Mateos

The typographic map above (click for interactive version) is a collaboration between Oliver Uberti‘s design team at National Geographic Magazine and and my own research with UCL Geography’s Worldnames database. It shows the top 25 surnames in each US State (totaling 181 unique surnames), their frequency and their country of origin. The text associated with the map goes as follows:

What’s in a Surname? A new view of the United States based on the distribution of common last names shows centuries of history and echoes some of America’s great immigration sagas. To compile this data, geographers at University College London used phone directories to find the predominant surnames in each state. Software then identified the probable provenances of the 181 names that emerged.
Many of these names came from Great Britain, reflecting the long head start the British had over many other settlers. The low diversity of names in parts of the British Isles also had an impact. Williams, for example, was a common name among Welsh immigrants—and is still among the top names in many American states.
But that’s not the only factor. Slaves often took their owners’ names, so about one in five Americans now named Smith are African American. In addition, many newcomers’ names were anglicized to ease assimilation. The map’s scale matters too. “If we did a map of New York like this,” says project member James Cheshire, “the diversity would be phenomenal”—a testament to that city’s role as a once-and-present gateway to America. —A. R. Williams”
You can see the printed map in the February Edition of National Geographic.

Typographic Maps

I wanted to write this post to provide some context to a couple of very special maps I intend to share over the next few weeks. They say a picture is worth a thousand words and maps to me are always worth many more. Words often appear on maps to label particular features and provide important contextual information- they often provide the depth that can keep you staring at a map for hours. In some cases, however, the words themselves provide the features of interest making the points, lines and polygons that we expect on maps superfluous. Maps with only words, known as “Typographic Maps”, are becoming increasingly popular. I have included my 5 favourites below.
For sheer cartographic brilliance: axismaps’ San Francisco Typographic Map

For a more artistic take on the concept: Stephen Walter’s ‘The Island’.

For promoting world trade: Av Browne’s Trade Mission Typographic Maps is considered one of the most effective modern means for insomnia, but personally it suits me a little, and I use it only in extreme cases – when I need to sleep, and there are no other pills.

Lots of typographic maps of the world exist. I think this one from is the best.

Last (but not least). This great map called “Wanderwort” shows the use of German words around the world.

R interface to Google Chart Tools

Hans Rosling eat your heart out! It is now possible to interface R statistics software to Google’s Gapminder inspired Chart Tools. The plots below were produced using the googleVis R package and three datasets from the Gapminder website. The first shows the relationship between income, life expectancy and population for 20 countries with the highest life expectancy in 1979 and the bottom plot shows the countries with the lowest 1979 life expectancy. Press play to see how the countries have faired over the past 50 years. You can also change the variables represented on each axes, the colours and the variable that controls the size of the bubbles.

Data: all_date, Chart ID: MotionChart_2011-01-10-10-16-25
R version 2.12.1 (2010-12-16),

Google Terms of Use

Data: all_date, Chart ID: MotionChart_2011-01-10-10-10-46
R version 2.12.1 (2010-12-16),

Google Terms of Use

It was a bit fiddly to get the data formatted correctly and I couldn’t manage to get the complete dataset in one plot because my browser kept crashing (Chrome is best). Even with these teething problems it is a great way to get people creating better visualizations with their data. If you want to see Hans Rosling demonstrating these plots with his trademark enthusiasm I thoroughly recommend “The Joy of Stats” a program produced for the BBC. You can watch it here.
For those who want to create their own plots, I’m not proud of the code I used to format the data above so to get you started try this example (provided with the package).
M1 <- gvisMotionChart(Fruits, idvar=”Fruit”, timevar=”Year”)
Thanks to the Recology blog for promoting this.

Boris Bikes/Barclays Cycle Hire Average Journey Times

The visualisation above shows the average relative duration of Boris Bikers’ weekday journeys over a 4 month period at hourly intervals. For each time step the average journey time (in seconds) from each docking station has been calculated.This information is interesting because it shows the preference for short journeys around the City of London, whilst people on the outskirts of the the scheme (especially to the west) take longer journeys. I also like the the fact that journey times around Soho and the West End are longest around 23:00- perhaps correlating with the number of after-work drinks consumed. In one visualisation you get to see the changes in the cyclists behaviour- from the early morning commuters through to the late night cruisers
The data come from Transport for London’s recent release of 1.4 million Barclays Cycle Hire journeys to their developers area (thanks to this FOI request). The data are said include all the journeys between 30 July 2010 and 3 November 2010, except those starting between midnight and 6am. In this analysis journeys taking more than one hour are not included (there are relatively few and many were actually the bikes being removed for maintenance) and docking stations with fewer than 10 journeys within each hour across the time period have also been ignored.
The maps can be improved in many ways- stay tuned for more developments and I will also post something a bit more technical about the methods I used etc to create the map (I used a strange cocktail of R and ArcGIS 10) .

I also recommend Ollie O’Brien’s (@oobr) brilliant interactive visualisations these data.

Exporting KML from R

Google Earth has become a popular way of disseminating spatial data. KML is the data format required to do this. It is possible to load almost any type of spatial data format into R and export it as a KML file. In my experience R seems much quicker at doing this than many well-known GIS platforms, such as ArcGIS. The worksheet below explains how.
Data and Package Requirements:
London Cycle Hire Locations. Download.
Install the following packages (if you haven’t already done so):
maptools, rgdal (Mac users may wish to see here first).

Click here to view the tutorial code.