One of my side projects while in confinement was to make a regularly-updated graph of virus numbers in Oxfordshire. I was inspired by a series of video animations by Tim Pokart charting the outbreak in China and comparing the actual number of cases with predicted growth curves. I haven’t done anything as complicated as that, the best I could do was fit the numbers to a bezier curve.
The graph existed on a separate page on this site for around two months. I’ve decided to stop it now because Oxfordshire is down to one or two new cases per day and, barring the possibility of a second wave, I’m no longer checking the figures daily.
Here’s the final graph (as PNG/SVG), and you can click on the image for an animated version.
I also made a graph of the total cumulative cases in Oxfordshire, this one is animated too.
The animations were made by generating 100 separate graphs in gnuplot, each one with an extra line of data. It was a headache to pipe together gnuplot and FFmpeg and I’m not sure the animation is really all worth it in the end. Maybe I learned something along the way about incrementing numbers in loops and reading files line-by-line. I also created a table from the final CSV following this tutorial.
Overall, this was supposed to be playing to the strengths of gnuplot (and FFmpeg) as tools which are designed to be run automatically. And it worked! My script downloaded the latest data, generated a new graph every day, and updated the page on this site, all without any manual intervention on my part. Here’s the gnuplot file and the bash script which tied everything together.
I was thinking of using this as a test case for displaying the connection logs from this site, showing the number of hits I’ve had in the last month?
The data on coronavirus cases all came from this government website, and the graph update script ran every 9 hours.
There were a couple of caveats to the latest data:
- The source data was often one or two days late.
- The source would often show the latest figure as ‘zero’, which was then increased afterwards.
- Cloudfront caches everything for 24 hours (I think), so without invalidating the cache, edge locations are going to see delayed information.
- The virus in the real world takes a couple of days to show symptoms.
- The source data only shows lab-confirmed cases, which doesn’t necessarily give a full picture of the situation.
So, you couldn’t really rely on this for proper analysis. Nevertheless, I looked up the popular objects in the S3 logs and the virus graph page was getting almost as many hits as the blog index. Even if it wasn’t meant as a serious service, did anyone actually use it? Keeping in mind that these were specifically only the numbers for Oxfordshire.
The media has a relentless focus on reporting the total deaths/cases. But this isn’t helpful, or at least it’s far less helpful than the daily change. What you want to know is how many cases there were this week relative to last week. Is the rate of reproduction going up or down, and if it’s going up, how many days left until we run out of available hospital beds?
Here is the sort of information I’d like to have seen coming out of the daily briefings:
- New cases vs. recoveries.
- Mortality rate and hospitalisation rate.
- Specific outbreaks or localised hotspots.
As a final example, here’s a graph of total cases by region in England.
London is the worst-hit region, probably soon to be overtaken by the North West. Now compare with the rate of infection per 100,000 people.
Counting cases per-capita gives us a very different picture. And however you count, you’re better off staying in the South West than you would be in the North West.
Up the NHS !