Cambridge Analytica, big data, poor journalism and possible Electoral Commission probe

The sloppy (and regrettably dominant) media coverage of Cambridge Analytica has been of the form “it worked for Trump, it worked for Brexit so their clever big data targeting is powerful”.

It’s sloppy for a range of reasons. One is that any sensible piece of coverage should start from the knowledge that almost all similar pieces of coverage about other firms in the past has been massively over-hyped.

It’s possible of course that despite so many journalists and pundits uncritically serving up hype before, this time they might be right. Possible, but it also means the sensible starting point is to be very, very careful. Especially given the absence of evidence – even in interviews with Cambridge Analytica employees – that what they do works.

Then there is the minor matter of “Cambridge Analytica worked for Trump”. True, but before Cambridge Analytica worked for Trump, it worked for Ted Cruz. Cruz didn’t win. Rather, he got crushed by a Donald Trump campaign that wasn’t then using Cambridge Analytica:

The story of the Republican primaries is actually that Cambridge Analytica’s flashy data science team got beaten by a dude with a thousand-dollar website. To turn that into this breathtaking story of an unbeatable voodoo-science outfit, powering Trump inexorably to victory, is quite a stretch.

So at the very best, its record in the US is 1-1. Even if you assign Trump’s victory over Clinton to its contributions rather than factors such as Clinton being the most unpopular Democrat candidate for President for decades and being so before Cambridge Analytica entered the fray.

Now let’s turn to Brexit. With a detour via the 2015 general election. Remember all the big-data-clever-targeting-dark-social-media-posts stuff about the Conservative campaign? Aside from the reasons to be a little sceptical of that too, it’s worth noting that the team which did all that worked on the European referendum. On the losing Remain side.

So the referendum was one clever-big-secret-data-and-digital-operation on each side. One won, one lost – and the one that lost had won the time before. Another good reason to be sceptical about how magic-making these things are, especially as previously campaigns hailed as doing super-clever-secret-stuff have crashed and burned, such as Rick Perry’s bid to be US President.

But finally, there’s the question of what Cambridge Analytica really did, either in the US or in the European referendum campaign here in the UK. There’s increasing doubt in the US about what was really done, as shown by one of the good pieces of journalism about their activities, carried out by the New York Times:

Cambridge executives now concede that the company never used psychographics in the Trump campaign. The technology — prominently featured in the firm’s sales materials and in media reports that cast Cambridge as a master of the dark campaign arts — remains unproved.

What’s more, to return to another piece of (good) coverage:

The authors tell us that Cambridge Analytica were using some combination of survey data, content scraped from social media, and traditional marketing data. They’re then doing some kind of sentiment analysis to build a ‘five traits’ profile of millions of Americans (and Britons, in the case of the Brexit campaign).

The five traits model is a real thing in psychology, sure, and it may have some predictive power for things like mortality. It’s important to note that it’s not undisputed or unflawed, however. It’s also true that you can take demographic data and correlate it with political leaning with a reasonable amount of success – we know that votes in the EU referendum tended to correlate to education, for example. That’s the big grain of truth at the heart of the story.

But let’s just think this through. Firstly, this is data that’s available to every other major data science campaign outfit. It’s not some secret buried hard drive they found. Second, this usage would go far beyond anything that any published science can support. OCEAN personality traits would normally be assessed through a questionnaire. To establish them from someone’s Facebook feed is at best an untested piece of science. Is their feed representative? Is it even public or available to you? Is your algorithm 100 per cent confident or (more likely) only 75 per cent?

Then there’s the challenge of bringing all this data together with any degree of accuracy. How confidently can you match a given Facebook account to a given record on the electoral roll? You might get lucky and find some location information that can match you to the only person with a specific name in a given town, and you might then be able to match that to a credit report or other bit of data.

What you end up with is a series of steps that individually sound plausible, but collectively turn to mush.

Whilst the evidence seems to point towards the Trump campaign using Cambridge Analytica to successfully target people with little interest in politics, the idea that such people exist or should be targeted is hardly new. Getting out beyond the media who usually cover politics, for example, has long been a mainstay of the new party leader media playbook.

Moreover, the sort of data Cambridge Analytica holds has turned out in some cases to be pretty mainstream, such as the details which David Carroll got about the data held on himself and the conference talk given by Matthew Oczkowski, Head of Product and Data Team Lead for the Trump Presidential Campaign, which showed the reliance on standard consumer data.

Meanwhile in the UK, whilst people speaking on its behalf had talked up Cambridge Analytica’s role in referendum, the official legal expense returns talk down its role.

Hence this news:

Stephen Kinnock, a pro-remain Labour MP, has asked the UK’s elections and referendums regulator to urgently “investigate whether it breached provisions in the Political Parties Elections and Referendums Act 2000”.

“The market rate for a donation of this kind could amount to hundreds of thousands of pounds, based on the previous experience of referendum campaigns and political parties for analytical tools,” Kinnock told Claire Bassett, Electoral Commission chief executive, in a letter seen by the Guardian. “Yet Leave.eu has not declared this donation-in-kind at any point in their returns to the Electoral Commission.”…

[But] Leave.EU has denied any wrongdoing. Its communications director, Andy Wigmore, said: “[CA] did no work for us formally and if they had it would have been way before you had to report expenditure … We never employed CA and they never gave us anything in kind”…

[Yet] Brittany Kaiser, senior Cambridge Analytica executive, featured on the panel at Leave.EU’s press launch in October 2015 and told the audience: “We are going to be running a bottom-up campaign. We are going to be running large scale research throughout the nation to really understand why people are interested in staying in or getting out of the EU and the answers to that will help inform our policy and communications.” [The Guardian]

By the way, in the US too Cambridge Analytica has walked back from stories of their involvement in the Trump campaign, as with this official statement from the firm:

Cambridge Analytica does not use data from Facebook … Psychographics was hardly used at all.

As the story which that statement is taken from concludes:

If you step right back and look at all this, what do we see? We see a data science firm with Steve Bannon on the board, bigly claims about its powers, whose exact methodology is unclear to us. We see a candidate, Donald Trump, who used the same successful strategy right the way through his campaign whether he was employing Cambridge Analytica or a random dude with HTML skills. We have another candidate, Ted Cruz, who used the same firm and tanked. We have another candidate, Hillary Clinton, who used something very similar to Cambridge Analytica and also lost.

How exactly do you turn all that into the story of an unstoppable data science behemoth?

UPDATE: After an extensive investigation, the Information Commissioner’s Office found, “no further evidence to change my earlier view that SCL/CA were not involved in the EU referendum campaign in the UK – beyond some initial enquiries made by SCL/CA in relation to UKIP data in the early stages of the referendum process”.

UPDATE 2; For a more recent study of whether or not Cambridge Analytica’s methodology worked, see the work by Sander van der Linden.

Leave a Reply

Your email address will not be published. Required fields are marked *

All comments and data you submit with them will be handled in line with the privacy and moderation policies.