I’ve made a twitter bot which tweets out the front page of the Morning Star newspaper every day. Give it a follow :)

The Morning Star puts pdfs online and links to them in a predictable date format. One of the first things I did with this was to iterate on the date and download every possible edition going back to around August 2017. I now have those all saved on my computer, and… that back-catalogue might come in useful one day.

So, we get today’s date, convert it to ISO 8601, and pass that into the url which we use to download today’s pdf. To make things easy, just output to star.pdf and overwrite yesterday’s file. I’m not going to be keeping an archive here, we only ever need the latest edition.

This imagemagick command will select the first page of the pdf (page zero), at 270 dpi, and convert it to a jpeg image.

convert -density 270 star.pdf[0] -strip -define jpeg:extent=4500kb star.jpg

The jpeg:extent flag adjusts image quality so that the image doesn’t exceed 4.5mb. The size limit for images on twitter is 5mb and in practice that’s more than enough, but let’s set the limit anyway just to be sure.

Next, we’re going to use Nodejs to post our front page image with a tweet. The main thing I’m relying on here is the twit library. The first two steps, downloading the pdf then converting to an image, are possible from within Nodejs. However for the sake of simplicity I used wget + convert in two lines in a bash script.

Here’s what the javascript looks like.

const cheerio = require('cheerio');
const request = require('request');

function getTitle(){
request({
    method: 'GET',
    url: 'https://morningstaronline.co.uk'
}, (err, res, body) => {
    if (err) return console.error(err);
    let $ = cheerio.load(body);
    let headers = $('h1');
    let mainheadline = headers.first();    
    let todaysHeadline = mainheadline.text();
    sendTweet(todaysHeadline);
});
}

function sendTweet(headline){
var Twit = require('twit');
var T = new Twit(require('config.js'));
const fs = require('fs');
var b64content = fs.readFileSync('star.jpg', { encoding: 'base64' })
T.post('media/upload', { media_data: b64content }, function (err, data, response) {
  var mediaIdStr = data.media_id_string
  var altText = "today's headline"
  var meta_params = { media_id: mediaIdStr, alt_text: { text: altText } }

  T.post('media/metadata/create', meta_params, function (err, data, response) {
    if (!err) {
      var params = { status: headline, media_ids: [mediaIdStr] }
 
      T.post('statuses/update', params, function (err, data, response){})
    }
  })
})
}

getTitle();

It’s probably not the most elegant way of doing it, and I don’t understand all of it, yet it works!

As you can see from getTitle this code also scans the Morning Star website, selects whatever is in the first h1 tag, and puts that in the tweet text. This is a compromise as the headline of the leading article on the website doesn’t always match the headline on the printed paper.

Is this illegal?

Twitter bots are controversial enough, here’s Tom Scott commenting on all the problems with open APIs.

When I’m automatically downloading a pdf every day, I’m also bypassing the payment system. Last year I did the responsible thing and mentioned the fact that this was possible to the editor Ben Chacko. He’s aware of the problem, and maybe it’ll get fixed in future.

I’m not going to explain it, I’m sure that anyone who wants to exploit this ‘vulnerability’ will stumble across it easily enough. For everyone else, you should pay for the online edition, because the paper needs money to stay in print and journalists need to eat. You can donate to the paper’s Fighting Fund here.

There’s another question about whether posting the front pages of a newspaper counts as copyright infringement. This little project was partly inspired by a BBC journalist called Neil Henderson posting advance front pages on twitter under #tomorrowspaperstoday. I don’t know if he gets permission before doing this, but in general if the BBC is doing it that means front pages are safe to share.

The headlines are always free to glance over in the newsagents, the front page is designed to grab your attention and get you reading the rest. If this bot can get a bit of publicity for the Star then I’ll consider it a success.

Future plans

At the moment the bot fires off shortly after 7am every day (except Sundays). It has already happened that the online edition was late coming out, which resulted in the bot incorrectly posting the previous day’s edition.

I could add a check to make sure the pdf has changed before posting, and if it hasn’t, sleep half an hour, return to the beginning of the loop, and repeat until the latest edition comes out. This could be combined with running the bot much earlier in the morning.