This is a boring long read about this site’s usage statistics. I’ve been watching these numbers all year, and finally I have an almost free day a few hours near the end of December to write about them.

To explain, I use Amazon Web Services, and Cloudfront, which mirrors all the files for my site to ‘edge caches’ in Amazon data centres all around the world. My S3 bucket is somewhere in London, eu-west-2. The information collected by Cloudfront is purely from connection logs. There’s no tracking, no cookies, no dodgy scripts. I don’t see any individual IP addresses, although fingerprinting individual visitors from is uncannily easy to do.

Here’s the outwards data transfer for the year, plotted in gnuplot. (data / gnuplot script)

usage

Cloudfront only keeps the past two months of records, and I forgot to collect the usage reports from most of July through November, so there’s a big blank gap in that period. Woops.

I set the upper range of the graph to cap out at 150MB. I’ve used my private S3 bucket for hosting large files, mostly videos of meetings, and that’s likely what caused the huge spikes in January. Amazon S3 isn’t really made to work as a cyberlocker, but hey Bayfiles is back now, so now I have somewhere for that ‘cheap/free, easily accessible, temporary, large file storage.’

Between all the Amazon services, I’ve paid between 20-50p per month through this year. And, I don’t expect that cost to rise as this blog isn’t going to get popular anytime soon. AWS is easily the best value web hosting for my purposes; the cheapest GoDaddy hosting package is £3 per month.

Next, let’s see how much I’ve written this year.

I’ve run a per-file word count on my posts directory:

find . -maxdepth 1 -type f -exec wc -w {} \;

All the post filenames are preceded with the date in ISO format. Here’s the regular expression for selecting (and deleting) of the remaining characters in the filename:

sed -e 's/(?![0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9])-.*//g'

This results in a list containing post date in one column and word count in the other. So, here’s a plot of posts and words per month. (data / gnuplot script)

posts and word count

The second line shows the cumulative word count over the course of the year.

A friend at the university told me that if you add up all the essays you have to write, the average student knocks out around 20,000 words per year. Along with the dissertation I wrote (and re-wrote) this year, I figure I’ve easily written over 60,000 words worth of published stuff. Not bad.

Aside from words, here are the media files added to the site per month.

media files per month

In total, over the course of 2019, the blog has grown by:

That’s not including the older posts I re-added from previous years, or this post here. The most popular objects are (predictably) the main CSS files, my contact page, the RSS feed, and the blog home directory. Nothing out of the ordinary there.

Counting files per post allows us to get a general idea of page views; there are multiple requests for each page loaded. Given that there are on average 11 media files per post, plus 2 more requests for the base document and the CSS file, plus hits from the RSS feed - divide the number of requests by about 15 and that’s a (very approximate) indication of how many page views I’m getting.

Next up there’s some information from the user-agent strings, which tells us about the site visitors. I generated these pie charts in libreoffice calc, because those can’t really be ‘plotted’, and gnuplot is fiddly enough as it is.

Here are the operating system numbers. (data)

operating systems

The most popular OS is GNU/Linux, followed by Windows. Some of those Windows visitors were still using Windows 7 - so here’s a tip, if you’re fed up running an aging system, how about bumping up those Linux figures? 😉

Here are the most popular browsers. (data)

browsers

I removed the requests identified as ‘bot’ because I’m only interested in ostensibly ‘human’ visitors. If I did count bots they would make up just under a third of all requests, and I don’t know what that’s all about.

From the browsers you can tell what the split is like between desktop and mobile users. Here is how the devices are broken down. (data)

devices graph

There’s definitely a pattern of requests from someone who uses Firefox on Ubuntu. And just to confirm my suspicions, the latest Steam survey numbers show Linux installed on 0.81% of PCs, and at the moment Firefox has around 4% of the browser market share. It doesn’t take much guesswork to understand that the most prominent visitor to this site is… me.

Still, I’m curious about all those other real humans who use Chrome and Mac OS.

It’s too much effort to create a world map of where the requests are coming from, in short the majority come from Britain, and the second-most from the USA, and then Germany.

There are a lot of referrals from the direct root url, so it looks like there are a few people (or bots) just who drop in on the site from time to time without being linked here specifically from somewhere else. The RSS feed does get some hits, although I suspect a lot of that is just me subscribing to my own blog. I don’t know many other people who still use feed readers. I also get some hits from twitter, which probably comes from me tweeting out new posts.

I’ve always understood this blog as operating in a condition of semi-public anonymity, and the best way I can explain that is that you assume nobody reads this, until someone does.

On a few occasions I noticed this blog was getting visits from Lithuania. I know who that is, and even if we don’t speak anymore, it’s still unusually comforting to know that person checks in on here sometimes. Similarly, an old friend mentioned to me that when we were both (much) younger he used to read my old blog, and maybe he still reads this one. Hi Ben! 👋

It’s getting pretty rare for someone to have their own space on the internet. Though maybe there’s an indieweb revival going on, everyone in my family now has their own site.

In other news, I heard today that the president of the USA was impeached.