Big Data is Watching You!
“What data storage and “mining” capabilities exist, what can they do, with your digital footprint? Who profits from analyzing your digital footprint, and should you care?”
George Orwell’s 1984 is about an authoritarian society in which everything people think and do is watched over by the all-powerful “Big Brother” (Orwell, 1949). People thought Orwell’s dystopian novel was a fantasy, but strangely enough we have ended up, in 2020, in a situation in which a very great deal of people’s thoughts and actions are, willingly, watched over by “big data”. This essay examines this unusual phenomenon by explaining data mining; it then discusses how this information is used and considers how serious an issue this is: are we really living in a world analogous to 1984, or is it all a storm in a teacup? I argue that big data really is watching us in ways that most people are ignorant about, and that this can be dangerous when that information is used for nefarious purposes.
First, it is necessary to define what we are talking about. The term “big data” was coined in the early years 2000s by Doug Laney, who defined it as the three Vs: volume, velocity, and variety (SAS, n.d.). The modern world produces huge levels of, often instantaneous, data, in many different forms. Consider Facebook as an example, every second they receive colossal volumes of data about any of their 2.7 billion active users who are using their platform (Statista 2020).
All the data that one person transmits about themselves collectively makes up their “digital footprint”. All day most people transmit masses of data about their movements (via GPS), their purchases (via credit cards), their interests (via social media) and much more. Something as simple as buying a coffee on a credit card, logging onto the cafe’s Wi-Fi and liking a friend’s photo is all making contributions to your digital footprint.
How is all this data used? The answer is through “data mining”, which is “the use of machine-learning algorithms to find faint patterns of relationship between data elements in large, noisy, and messy data sets, which can lead to actions to increase benefit in some form (diagnosis, profit, detection, etc.)” (Nisbet, Miner, and Yale 2018). Data mining is part of the process of “knowledge extraction” in which data is gathered, processed, analysed and evaluated to produce knowledge (Rathod and Valmik 2015). Data mining tools include classification, clustering and decision trees. Increasingly data mining capabilities are becoming part of standard office tools, e.g. the XLMiner add on to Microsoft Excel (Nisbet, Miner, and Yale 2018).
Big data is also big business. The top five tech companies on the New York Stock Exchange are all data-driven and have collectively increased in value by over $1.5 trillion in 2020 (Wakabayashi, 2020). What’s really interesting about these companies is that they offer their services for free to people, but what they get back from their users is their data, which is worth a lot of money, because it can be used by advertisers to target exactly the people they want to reach with specific campaigns, based on all the info they have given to Facebook, YouTube, or Google about themselves. Andrew Lewis coined an appropriate saying about this, “If you are not paying for it, you’re not the customer; you’re the product being sold” (Quote Investigator, 2017).
But what’s the problem with all this? People voluntarily choose to use free services which they enjoy, and as a result of that the people who made those services make tonnes of money; why should anyone care? To answer this let’s think about the big data companies in social media. This paper argues that there are three main problems: the way social media companies use data, who they share data with, and how those third parties use the data.
The first problem is about the social media companies themselves. Their number one aim is to keep their “users” addicted to their platforms. To do this they use the data they have about them to feed people a constant diet of what they think they will like. The result of this is that it reinforces the things that people already think, trapping them in a social media bubble in which everyone seems to think the same way. The impact of this is reflected in statistics on the increasing polarisation of people’s political views since the dawning of the social media age, which many see as dangerous for democracy (Chuck, 2019).
The second problem concerns the people social media companies share information with. People normally have no idea who their data is being shared with. Despite consent agreements they normally don’t know their data is then shared with third parties; they also don’t know that some of these organisations exist solely to gather data on people which they can then sell off to others who might be interested in it (Stirista 2020). From an ethical perspective, this appears to contradict people’s rights to privacy because their data is available to other people whom they do not even know exist.
The third problem concerns how people who get access to people’s data use it. Perhaps the biggest concern here is around targetted campaigns of misinformation intended to bring about political ends. Targeted misinformation which plays on the prejudices and fears of voters, as revealed by big data, has contributed to such major political controversies as the rise of Donald Trump and the Brexit vote in the UK, with particular attention paid to the role played by data mining companies such as Cambridge Analytica (Rawnsley, 2018).
So, in conclusion, we live in an era in which unprecedented amounts of data are generated by us all. Most people are completely unaware of how it is used by the companies who collect it, shared by those companies, and then used by third parties, sometimes for nefarious purposes. We are not truly in the world of 1984, not least because people consent to sharing their data, but we are in a dangerous place, because big data means big knowledge, and knowledge is power which can easily be misused.
Chuck, Elizabeth. 2019. “’Greatest Propaganda Machine in History’” NBC. November 22, 2019.
Nisbet, Robert, Gary Miner, and Ken Yale. 2018. Handbook of Statistical Analysis and Data Mining Applications. Cambridge: Academic Press
Orwell, George. 1949. 1984. London: Secker and Warburg
Quote Investigator. 2017. “You’re Not the Customer, You’re the Product.” Accessed November 23, 2020. https://quoteinvestigator.com/2017/07/16/product/
Rathod, Dipeeka K., and Neha Valmik. 2014. “Overview of Data Mining Techniques,” Advances in Computer Science and Information Technology 1, no. 2: 74-76 https://www.krishisanskriti.org/vol_image/02Jul201510075517%20%20%20Dipeeka%20k. %20Rathod%20%20%20%20%20%20%2074-76.pdf
Rawnsley, Andrew. 2018. “Politicians Can’t Control the Digital Giants with Rules Drawn Up for the Analogue Era,” The Guardian, March, 25, 2018. https://www.theguardian.com/commentisfree/2018/mar/25/we-cant-control-digital-giants- with-analogue-rules
SAS. n.d. “Big Data.” Accessed November 20, 2020.
Statista. 2020. “Facebook: number of daily active users worldwide 2011-2020.” Accessed November 21, 2020. https://www.statista.com/statistics/346167/facebook-global-dau/
Stirista. 2020. “Third Party Data” Accessed November 22, 2020.
Wakabayashi, Daisuke. 2020. “Big Tech Continues Its Surge Ahead of the Rest of the Economy.” New York Times, October 29, 2020. https://www.nytimes.com/2020/10/29/technology/apple- alphabet-facebook-amazon-google-earnings.html