On monday I gave a talk about alternative marxist approaches to the state, and on wednesday I handed in my reading summary for this month. So yesterday I was able to relax a little and pursue some of my other projects. I had another go at using QGIS.
Peter has this side project of his own, a database of cafes, with a little web service which checks your location and gives you a map with all the nearest cafes. It’s called nearby.cafe - give it a try!
There’s a list somewhere online with all the locations of all the Starbucks cafes in the world. However, for comprehensive and accessible data, the best source really is OpenStreetMap. I wrote a tutorial about extracting data from OSM in 2015, and I’ve put the post online here as part of my ongoing process of republishing old blog posts.
Peter had some issue working with the OSM planet file, he uses a Microsoft Surface, and the full planet file is quite large when it’s completely uncompressed. It’s around 1TB. Thankfully, last year I put an extra 2TB hard drive in my desktop, so the size isn’t really a problem for me.
I used the BBBike mirror - it’s updated weekly, and they provide a file without author, date and version metadata. It’s a bit smaller.
I left my computer running overnight to deal with the download.
We’ll be using OSM C tools… you can’t do much with a PBF file, but the tools will work with the file in O5M format. I don’t really understand the underlying differences, all you need to know is that the O5M file is still compressed, and it came out about twice the size of the PBF file.
Next, filter the file to remove everything but the cafes, and convert the O5M file to uncompressed OSM.
osmfilter planet-latest-nometa.o5m --keep="amenity=cafe" -o=planet_cafe.osm
That results in a 112MB file, much more manageable. I could even open it up in a text editor, it was just over 3 million lines of XML and surprisingly, my editor didn’t crash immediately. I think that’s the largest file I’ve ever successfully opened in plain text.
There are still tags in there, and we don’t care about those, we’re just interested in the raw points, the latitude-longditude coordinates. You can try to get rid of some tags with this:
osmfilter planet_cafe.osm --drop-tags="<bunch of tags here>" -o=planet_cafe_notags.osm
I went even further with sed to filter out all tags.
sed -i -e 's/<tag.*\/>//g' planet_cafe_notags_test.osm
That reduced it down even further to 44.6MB, which looked fine but it broke when I tried importing it into QGIS. In the end I found that reducing the overall size didn’t matter as GQIS had no trouble dealing with the bigger file.
About a third of the cafe objects are polygons, and again we just want raw points. QGIS has a tool for determining the centroid of each polygon and creating new point layer. Merge that point layer with the other points, delete the polygons, and bingo, you get a lot of dots:
I like how you can clearly see the outline of countries, just by plotting cafes and nothing else.
There are a few methodological problems with this, at least the main one being that it’s just a population map. Beyond that it’s also subject to the biases of volunteer geographic information. For example, unauthorised cartography projects are still banned and/or heavily discouraged in China, so data from there is significantly absent. There are around 3,500 Starbucks cafes in China, and you just won’t find all of them in this dataset.
Someone else pointed out another problem with this map; it’s quite bland.
So, I had a go at making it a bit prettier, I got the stamen design toner style as a basemap, made the dots all a delightful shade of purple, and added a heatmap.
Here are all the cafes in central Leicester.
And here’s the same style, with a global landmass shapefile, and some country borders, and major populated areas in light grey (you can’t really see these). I’m going for something like the DEFCON map aesthetic.
I think that looks jazzy enough, although not the sort of thing you could use in an academic context. If I want something more meaningful, I’ll need a choropleth map with cafes counted against population density. That’s my next goal.
For the moment this has just been a fun exercise, which got me familiarised with QGIS and back into the habit of working with OSM data.