Ethical Issues in Learning Analytics

Stephen Downes

3

In the previous section we hope that we have established that there is a wide range of uses for learning analytics and AI in education, from tools institutions can use to manage resources and optimize offering through to tools individuals can use to learn more effectively and quickly. If there were no benefits to be had from analytics, then there would be no ethical issues. But in part because there are benefits, there are ethical issues. No tool that is used for anything is immune from ethical implications.

As a result, as de Bruijn (2020) writes, “There is widespread demand for applied AI ethics. This is perhaps unsurprising in relation to government, academia, regulators and the subjects of algorithmic decision-making. However, for industry, a deluge of negative press over the last year could be seen as evidence of the reverse – a disregard for the ethical implications of AI-driven products and services. Yet this would be an oversimplification of reality.”

Indeed. There are many documents extant devoted to identifying and tracking these issues. These issues are captured not only in the criticisms arising from the application of analytics, but also in the principles and codes of conduct developed in response to these criticisms. While what follows is by no means an authoritatively complete catalogue of the issues that have been raised, it is a reasonably comprehensive listing, intended to identify not only the most of-cited and common concerns, but to dig more deeply into the ethical implications of this technology.

The ethics of analytics is particularly complex because issues arise both when it works, and when it doesn’t. Consequently, in an approach we will follow, Narayan (2019) classifies these issues under three headings: issues that arise when analytics works, issues that arise because analytics are not yet reliable, and issues that arise in cases where the use of analytics seems fundamentally wrong. To these three sets of issues we will add a fourth describing wider social and cultural issues that arise with the use of analytics and AI.

When Analytics Works

Modern AI and analytics work. As Mark Liberman observes, “Modern AI (almost) works because of machine learning techniques that find patterns in training data, rather than relying on human programming of explicit rules.” This is in sharp contrast to earlier rule-based approaches that “generally never even got off the ground at all.”

As we have seen, analytics can be used for a wide range of tasks, some involving simple recognition, some involving deeper diagnostics, some making predictions, and some even generating new forms of content and even making determinations about what should or ought to be done.

In such cases, it is the accuracy of analytics that raises ethical issues. In many cases there is a virtue in not knowing something or not being able to do something that is challenged when analytics reveals everything. The following sections consider a few examples.

Surveillance

Analytics and AI require data above all, and so in order to support this need institutions and industries often depend on surveillance. “Along with private entities, law enforcement and other government agencies are among the first actors to deploy automated surveillance systems. However, even though automated surveillance systems promise to bring various benefits like enhanced security, when in wrong hands, these systems can violate civil liberties.” (UC Berkeley, 2019)

Surveillance, however, is not an all-or nothing proposition. “We can and must have both effective law enforcement and rigorous privacy protections. Eternal vigilance will be required to secure our fundamental rights, including the right to privacy in relation to all public spaces, including those found online and in other virtual spaces” (Cavoukian, 2013:22). Moreover, as seen above, analytics and AI produce numerous benefits. But it’s a question of degree. We’re happy to have the police officer watch over the public square, however, “there’s no place for the state in the bedrooms of the nation” (CBC, 1967).

That’s easier to say than to practice. Once surveillance becomes normal, its use expands. In San Diego, for example, smart street lights are experiencing ‘mission creep’. “The San Diego Police Department has said it accesses smart streetlight footage only to help solve the most serious, violent crimes. But a closer look at the data shows that investigators have also used the streetlights in cases related to vandalism, illegal dumping and destruction of city property” (Marx, 2020). Private actors, as well, employ surveillance for their own purposes. For example, Amazon-owned Whole Foods is tracking its employees with a heat map tool that ranks stores most at risk of unionizing (Peterson, 2020). And a company called Splunk helps companies monitor people working remotely (Tully, 2020).

And once surveillance becomes normal – so normal it’s in your street lights – it can have an impact on rights and freedoms. People who know they are on camera behave differently that they would in private. According to security expert Bruce Schneier, “The fact that you won’t do things, that you will self-censor, are the worst effects of pervasive surveillance…. The idea is that if you don’t know where the line is, and the penalty for crossing it is severe, you will stay far away from it.” (Shaw, 2017)

Tracking

As (Cavoukian, 2013:23) writes, “it is one thing to be seen in public. It is another to be tracked by the state.” The same could be said about tracking by advertising agencies, bill collectors, competitors, and stalkers. Analytics, however, makes tracking accessible to everyone. “Miniature surveillance drones, unseen digital recognition systems, and surreptitious geolocational monitoring are readily available, making long-term surveillance relatively easy and cheap” (Ibid).

Critics argue that tracking is “creepy” and can be used against unsuspecting victims. “The trick is in taking this data and shacking up with third-parties to help them come up with new ways to convince you to spend money, sign up for services, and give up more information. It would be fine if you decided to give up this information for a tangible benefit, but you may never see a benefit aside from an ad, and no one’s including you in the decision” (Princiya, 2018). According to Kelly 92019), Google, ClassDojo and E-Hallpass say in their privacy policies that student data is not shared with third-party companies for marketing or advertising. But the Cambridge Analytica scandal (Cadwalladr and Graham-Harrison, 2018) shows that bad actors may distribute this data nonetheless.

Tracking is also described as “intrusive” and critics argue that (say) workplace tracking violates a person’s right to privacy. The Canadian Privacy Commissioner says “If an employer’s collection, use or disclosure of personal information about employees is not one that a reasonable person would consider appropriate in the circumstances, it will violate the norms set out in the legislation” (OPCC, 2008). But tools like Cookies RFID chips are always present and can fade into the background, creating the opportunity to unobtrusively collect much more data than needed.

Anonymity

It is arguable that anonymity is a virtue, however, many applications of analytics reduce or eliminate anonymity. For example, it is widely argued that “anonymity helps support the fundamental rights of privacy and freedom of expression.” (Bodle, 2013) And it is valued. One report argues, for example, that “86% of internet users have taken steps online to remove or mask their digital footprints” and “55% of internet users have taken steps to avoid observation by specific people.” (Raine, et.al., 2013)

Analytics impacts on anonymity in two ways. First, it makes it difficult to be anonymous. According to a Pew survey, “59% of internet users do not believe it is possible to be completely anonymous online, while 37% of them believe it is possible.” This is partially because of spying and tracking, and partially because data about individuals can be cross-referenced. “When Facebook acts as a third-party tracker, they can know your identity as long as you’ve created a Facebook account and are logged in — and perhaps even if you aren’t logged in. It is also possible for a tracker to de-anonymize a user by algorithmically exploiting the statistical similarity between their browsing history and their social media profile” (Princiya, 2018).

Second, analytics arguably creates a social need to eliminate anonymity. As Bodle argues, “A consensus is growing among governments and entertainment companies about the mutual benefits of tracking people online.” Hence, provisions against anonymity, he argues, are being built into things like trade agreements and contracts.

All of that said, anonymity is not unambiguously good. For example, John Suler describes the ‘Online Disinhibition Effect’, which includes “dissociative anonymity” as a factor (Suler, 2004), and while it helps students feel safe and secure and helps them ‘come out of their shell’, it has also attributed as factor in online bullying and abuse (O’Leary and Murphy, 2019).

Facial Recognition

Facial recognition is a technology that carries with it a whole class of ethical objections and concerns.

There is a direct bearing on anonymity, as apps like Clearview make clear. “What if a stranger could snap your picture on the sidewalk then use an app to quickly discover your name, address and other details? A startup called Clearview AI has made that possible.” (Moyer, 2020) Clearview is one of a number of applications that are using publicly available image data to identify people. “Technically, it scrapped(sic) off billions of images from Facebook, YouTube, Venmo, and other sites to utilize it as its own database. And, when there’s match for your face, it becomes easy to get your information.” (Das, 2020)

Mark Andrejevic & Neil Selwyn (2019) outline a number of additional ethical concerns involving facial recognition technology in schools:

The dehumanising nature of facially focused schooling
The foregrounding of students’ gender and race
The inescapable nature of school-based facial recognition
The elimination of obscurity
The increased authoritarian nature of schooling
The cascading logic of automation (ie., it gets used for more and more things)
The future oppression of marginalised groups within schools

Additionally, as Microsoft’s Michael Karimian noted, “technology companies are now being forced to take on much greater responsibilities, such as considering how facial recognition software might infringe on the right to assembly in certain countries. This highlighted the indirect social impact of AI systems and the affect(sic) this might have on Children (UNICEF, 2019).

There have been campaigns to ban facial recognition (Conger, et.al. 2019; Samuel, 2019). Arguably, however, these efforts miss the point. It’s just one identification technology among many. “People can be identified at a distance by their heart beat or by their gait, using a laser-based system. Cameras are so good that they can read fingerprints and iris patterns from meters away. And even without any of these technologies, we can always be identified because our smartphones broadcast unique numbers called MAC addresses” (Schneier, 2020).

Privacy Generally

The previous sections each raise their own issues, but all touch on the issue of privacy generally. “Ethical and privacy issues in learning analytics include the conditions for the collection or aggregation of data, informed consent, de-identification of data, transparency, data security, interpretation of data, as well as data classification and management” (Griffiths, et.al., 2016:6). But what is privacy, and how is it an ethical concern?

An Ontario court ruling states, “Personal privacy is about more than secrecy and confidentiality. Privacy is about being left alone by the state and not being liable to be called to account for anything and everything one does, says or thinks” (Cavoukian, 2013:18). We might say people should be able to live their lives in ‘quiet enjoyment’ of their possessions, property and relationships (Andresi, 2019).

Privacy is central to numerous statues and ethical codes. “It includes protection against unlawful interference with an individual’s privacy, family, home or correspondence, and to unlawful attacks against his or her honor or reputation… the Human Rights Committee specifies that “the gathering and holding of personal information on computers, data banks and other devices, whether by public authorities or private individuals or bodies, must be regulated by law.” (UC Berkeley, 2019)

But privacy is not unambiguously good. As Griffiths, et.al. (2016:5) note, “many employers take steps to be able to monitor all of their worker’s Internet use. Employers can also buy software that enables them to rate their employees on the basis of their browsing history crosslinked with a database of thousands of web sites categorised as ‘productive’, ‘unproductive’ or ‘neutral’.” They believe that “because they own the computer, they have the right to read the e-mail it produces” (Ibid:5).

In many cases it may well be argued that people have a right to access data. Privacy protects the powerful, at the expense of the weaker. For example, Shelton (2017) argues that “limiting access to property (or any other kind of) data prevents any large scale analysis of these processes by citizens,” thus “disempowering them by curtailing their ability to couch their claims in the necessary language of data.” And there is little consensus on the scope of the issue or the remediation that is appropriate, as the analysis by Fjjeld, et.al. (2020:21) clearly shows.

Privacy Failures

As the Cambridge Analytica scandal mentioned above shows, even when privacy is desired and expected, it can be violated or break down. This is a case where analytics works as designed, but the supporting systems and environment do not. There have been numerous instances of failure over the years.

One class of failure occurs when the company in question breaches what is normally considered to be the ethical acquisition of data. For example, in one of the most high profile the inBloom service was closed down on April 21, 2014 (Kharif 2014). In another case, the data from rented tablets with pre-installed apps was used without the informed consent of the parents (Griffiths, et.al., 2016:4).

The second class of breach occurs when the collection of data collection itself is of questionable ethical value. This is the case in the Cambridge Analytica breach (Cadwalladr and Graham-Harrison, 2018) where the data described not only the user but also their friends and contacts. A similar case involved the credit-monitoring firm Equifax, which compiles data on people whether they want them to or not (Fruhlinger, 2019). In Holland, a System Risk Indication (SyRI) was designed by the Dutch government to identify people likely to commit benefits fraud. In a recent case, the District Court of the Hague ruled that it “ failed to strike a balance between the right to privacy and the public interest” (Privacy International, 2020).

Assessment Issues

Analytics used for assessment can score student work with accuracy and precision. Students recognize this. But students have mixed feelings about such systems, preferring “comments from teachers or peers rather than computers.” (Roscoe, et.al., 2017) It is arguable that students may prefer human assessment because they may feel more likely to be seen as an individual with individual flair, rather than erroneously deviating from the expectations of the analytics engine. As one college official says, “”Everyone makes snap judgments on students, on applicants, when first meeting them. But what worries me about AI is AI can’t tell the heart of a person and the drive a person has.”

A significant ethical issue arises when assessments are made based on predictive data, rather than actual practice. For example, “The growing use of AI in the criminal justice system risks interfering with rights to be free from interferences with personal liberty. One example is in recidivism risk-scoring software used across the U.S. criminal justice system to inform detainment decisions at nearly every stage, from assigning bail to criminal sentencing” (Access Now, 2018:19).

In the case of predictive analytics in learning technology, systems can identify factors statistically correlated with worse performance. This allows institutions to minimize their own risk, at the expense of students. “Institutions can then treat ‘high risk’ individuals differently, with the aim of ensuring they do not end up counting as negative statistics for completion” (Scholes, 2016). Schools may respond with enrolment limitations or assignment of additional work, but this may in some cases harm, rather than help, individual students.

Lack of Discretion

Humans often use discretion when applying the rules. “Organizational actors establish and re-negotiate trust under messy and uncertain analytic conditions” (Passi and Jackson, 2018) In the case of learning analytics, Zeide (2019) writes that a human instructor might overlook a student’s error “if she notices, for example, that the student clearly has a bad cold.” By contrast, “Tools that collect information, particularly based on online interactions, don’t always grasp the nuances.”

The impact of a lack of discretion is magnified by uncertainties in the data that might be recognized by a human but overlooked by the machine. Passi and Jackson (2018) “describe how four common tensions in corporate data science work – (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models –raise problems of trust, and show the practices of skepticism, assessment, and credibility by which organizational actors establish and re-negotiate trust under uncertain analytic conditions: work that is simultaneously calculative and collaborative.”

By contrast, it could be argued that it is humans that lack the discretion shown by machines. The assessments made by AIs may be much more fine-grained than those made by humans. An analytics engine given student health data may well more consistently and reliably make allowances than a teacher could. It could avoid the sort of discretion that introduces biases and unfairness in grading, evaluations, and interactions based on knowledge of irrelevant information about the students (Malouff & Thorsteinsson, 2016).

Lack of Appeal

There is widespread aversion to being subject to decisions made by machines without possibility of appeal. This was the first question raised in a recent French debate on ethics in AI: “Will the prestige and trust placed in machines, often assumed to be ‘neutral’ and fail-proof tempt us to hand over to machines the burden of responsibility, judgment and decision-making?” (Demiaux and Abdallah, 2017)

What is required, according to many, is an ability to appeal, “the possibility that an individual who is the subject of a decision made by an AI could challenge that decision” (Fjeld, et.al., 2020:32). The Access Now report calls for “a human in the loop in important automated decision-making systems, which adds a layer of accountability” (Access Now, 2018). There is additionally a need for a principle of “remedy for automated decision” that is “fundamentally a recognition that as AI technology is deployed in increasingly critical contexts, its decisions will have real consequences, and that remedies should be available just as they are for the consequences of human actions” (Fjeld, et.al., 2020:33).

Content Manipulation

Analytics is used to create misleading images and videos (a.k.a. Deepfakes, as described above). While human lies and deception are nothing new, arguably technologies like Deepfakes are “a looming challenge for privacy, democracy and national security.” Chesney and Citron, (2018:1760) write “To take a prominent example, researchers at the University of Washington have created a neural network tool that alters videos so speakers say something different from what they originally said. They demonstrated the technology with a video of former President Barack Obama (for whom plentiful video footage was available to train the network) that made it appear that he said things that he had not.” This creates doubt in the veracity of video evidence, and undermines the public’s ability to rely on evidence.

What makes this type of technology a special case is that it is accessible to everyone. “The capacity to generate deep fakes is sure to diffuse rapidly no matter what efforts are made to safeguard it.” Chesney and Citron, (2018:1763) Indeed, this diffusion has already begun. Moreover, content platforms have shown little willingness (or ability) to filter or block content that is not obviously illegal, which creates an audience for the misleading content. (Friedberg and Donovan, 2019). Despite some high-profile examples, such as the delisting of Alex Jones (Martineau, 2019), the longstanding problem of ‘catfishing’ – the use of fake profiles to swindle victims – is more illustrative of the potential scope of the problem (Couros and Hildebrandt. (2016).

There are numerous unethical uses of content manipulation, including exploitation, sabotage, harm to society, distortion of discourse, manipulation of elections, erosion of trust, exacerbation of divisions, undermining of public safety, and undermining journalism. (Chesney and Citron, 2018:1772-1786).

Manipulation of the User

A number of recent high-profile cases have raised the possibility of analytics being used to (illegitimately?) manipulate the thoughts, feelings and emotions of users. For example, one study experimented on Facebook users (without their knowledge or consent) to show that “emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness.”(Kramer, Guillory & Hancock, 2014) And a study from Mozilla raises the question, “What happens when we can increase the number of people who believe something simply by changing the voice that it is read in?” (Kaye, 2020).

An article from RAND suggests, “Artificial intelligence–driven social bots that are now sending you car or skin care advertisements could start chatting with you—actually, experimenting on you—to test what content will elicit the strongest reactions…. Whoever is first to develop and employ such systems could easily prey on wide swaths of the public for years to come” (Paul and Posard, 2020).

Manipulation of the user can be used for beneficial purposes, as described above. However it becomes ethically problematic when the institution, rather than the user, benefits. As Kleber (2018) writes, “Casual applications like Microsoft’s XiaoIce, Google Assistant, or Amazon’s Alexa use social and emotional cues for a less altruistic purpose — their aim is to secure users’ loyalty by acting like new AI BFFs. Futurist Richard van Hooijdonk quips: “If a marketer can get you to cry, he can get you to buy.”

Moreover, Kleber continues, “The discussion around addictive technology is starting to examine the intentions behind voice assistants. What does it mean for users if personal assistants are hooked up to advertisers? In a leaked Facebook memo, for example, the social media company boasted to advertisers that it could detect, and subsequently target, teens’ feelings of ‘worthlessness’ and ‘insecurity,’ among other emotions (Levin, 2017)”.

Microtargeting

To ‘microtarget’ is to use data analytics to identify small and very specific population groups and target advertisements specifically to them. According to critics, granular targeting of voters undermines the political process because people do not have a shared understanding of what a political party represents or stands for. The party can be anything to anyone, and there is no mechanism to criticize, or even see, their claims. “Ad targeting reached new heights in the 2016 US presidential election, where the Trump campaign served 5.9 million ad variations (Bell, 2020) in just six months” (Sharpe, 2020).

Discrimination

Schneier (2020) writes, “The point is that it doesn’t matter which technology is used to identify people… The whole purpose of this process is for companies — and governments — to treat individuals differently.” In many cases, differential treatment is acceptable. However, in many other cases, it becomes subject to ethical concerns.

The accuracy of analytics creates an advantage for companies in a way that is arguably unfair to consumers. For example, the use of analytics data to adjust health insurance rates (Davenport & Harris, 2007) works in favour of insurance companies, and thereby, arguably, to the disadvantage of their customers. Analytics are used similarly in academics, sometimes before the fact, and sometimes after.

Additionally, “We are shown different ads on the internet and receive different offers for credit cards. Smart billboards display different advertisements based on who we are.” (Schneier, 2020) Perhaps this is appropriate if the differentiation is based on interests or affiliations, but it becomes problematic if based on gender, age or race. For example, predictive analytics may turn potential students away from programs based on prejudice. “Algorithms might be reinforcing historical inequities, funneling low-income students or students of color into easier majors” (Barshay and Aslanian, 2019)

We are already facing a potential worst-case scenario in Facebook. This case was made by Sacha Baron Cohen in a recent address. He “calls the platforms created by Facebook, Google, Twitter, and other companies ‘the greatest propaganda machine in history’ and blasts them for allowing hate, bigotry, and anti-Semitism to flourish on these services” (Baron Cohen, 2019; ADL, 2019).

A significant impact of discrimination is that AI and analytics may be used to deny basic human rights. For example, Access Now (2018) writes, “Looking forward: If AI technology is used for health and reproductive screening, and some people are found to be unlikely to have children, screening could prevent them from marrying, or from marrying a certain person if the couple is deemed unlikely to conceive. Similarly, AI-powered DNA and genetics testing could be used in efforts to produce children with only desired qualities.”

It is not a stretch at all to see more fine-grained discrimination applied to the allocation of learning opportunities, limitations on employment, and other impacts. For example, in a case where failure was determined by predicted learning events, the “Mount St. Mary’s University… president used a survey tool to predict which freshman wouldn’t be successful in college and kicked them out to improve retention rates” (Foresman, 2020).

Fear and Anxiety

Even when analytics works properly, and even when it is being used with the best of intentions for obviously benign purposes, it may still have a negative impact. These effects have already been seen in biomedical analytics.

On the one hand, the result of the feedback might be fear and panic. For example, Mittelstadt (2019) reports, “John Owens and Alan Cribb argue that personal health devices such as the ‘FitBit’, which claim to help users live healthier lives by monitoring behaviour and feeding back information to promote healthy decisions, may instead expose users to risks of anxiety, stigma, and reinforcement of health inequalities.”

Also, the constant exposure to analytics may create a sense of dependency in the user. “Nils-Frederic Wagner introduces the notion of ‘patiency’ as a correlate to user agency. Health-monitoring devices are often thought to persuade or nudge users paternalistically towards health-promoting behaviours, which would seem to undermine the user’s agency and autonomy” (Ibid).

Learning analytics could have similar effects in education and schools. Constant measurement and feedback can produce ‘test anxiety’. And exposure to such feedback may push the learner toward educational providers, even in cases where their support might not be needed.

When Analytics Fails

Artificial Intelligence and analytics often work and as we’ve seen above can produce significant benefits. On the other hand, as Mark Liberman comments (2019), AI is brittle. When the data are limited or unrepresentative, it can fail to respond to contextual factors our outlier events. It can contain and replicate errors, be unreliable, be misrepresented, or even defrauded. In the case of learning analytics, the results can range from poor performance, bad pedagogy, untrustworthy recommendations, or (perhaps worst of all) nothing at all.

Error

Analytics can fail because of error, and this raises ethical concerns. “Analytics results are always based on the data available and the outputs and predictions obtained may be imperfect or incorrect. Questions arise about who is responsible for the consequences of an error, which may include ineffective or misdirected educational interventions” (Griffiths, et.al., 2016:4).

The concern about error is large enough that many argue in favour of a corresponding ‘Right to Rectification’ – that is, “the right of data subjects to amend or modify information held by a data controller if it is incorrect or incomplete.” (Fjeld, et.al., 2020: 24) We find it, for example, in article 16 of the European General Data Protection Regulation (EU, 2016). Even so, errors may go undetected in data, they may be detected but ignored by practitioners or they may be detected by people who are denied permission to rectify them. Consequently, ethical questions arise around the need for oversight and error-prevention.

Analytics systems can be based on good data but produce incorrect results. For example, in case where the Metropolitan Police in London deployed facial recognition cameras to detect wanted suspects, it was accurate only 70% of the time, according to the department. “But an independent review of six of these deployments, using different methodology, found that only eight out of 42 matches were ‘verifiably correct.” Either way (and the veracity of published reports raises ethical concerns of its own) the cameras were frequently wrong. (Shaw, 2020;Fusset and Murray, 2019)

Reliability

Analytics requires reliable data, “as distinguished from suspicion, rumor, gossip, or other unreliable evidence” (Emory University Libraries, 2019). Meanwhile, a ‘reliable’ system of analytics is one without error and which can be predicted to perform consistently, or in other words, “an AI experiment ought to ‘exhibit the same behavior when repeated under the same conditions’ and provide sufficient detail about its operations that it may be validated (Fjeld, et.al., 2020:29). Both amount to a requirement of “verifiability and replicability” of both data and process.

Additionally, the reliability of models and algorithms used in analytics “concerns the capacity of the models to avoid failures or malfunction, either because of edge cases or because of malicious intentions. The main vulnerabilities of AI models have to be identified, and technical solutions have to be implemented to make sure that autonomous systems will not fail or be manipulated by an adversary” (Hamon, Junklewitz & Sanchez, 2020, p.2).

Reliability precludes not only accidental inconsistency but also deliberate manipulation of the system. That’s why, for example, the German AI Strategy highlights that a verifiable AI system should be able to “effectively prevent distortion, discrimination, manipulation and other forms of improper use.” (German Federal Ministry of Education and Research, 2018)

Reliability requires auditing and feedback. “The “evaluation and auditing requirement” principle articulates the importance of not only building technologies that are capable of being audited, but also to use the learnings from evaluations to feed back into a system and to ensure that it is continually improved (Fjeld, et.al., 2020:31).

It is not yet clear that learning analytics are reliable. “Students can benefit from learning analytics, although the research evidence is equivocal on their reliability and the conditions under which they are most effective” (Contact North, 2018). The ICDE Ethics in learning Analytics guidelines speak specifically about “reliability of data” (Slade and Tait, 2019) (more on that below). The JRC Guidelines recommend ensuring “the validity and reliability of tools, and whether they are employed effectively in specific contexts” (Ferguson, et.al., 2016).

Consistency Failure

Many analytics systems operate over distributed networks. As such, there may be cases where part of the network fails. This creates the possibility of a consistency failure, that is, when the state recorded by one part of the network is different from the state recorded by the other part of the network. The issue lies in how to resolve consistency failures. The CAP Theorem, also known as Brewer’s conjecture, asserts that “it is impossible for a web service to provide the following three guarantees: consistency, availability, and partition-tolerance” (Gilbert & Lynch, 2002).

Consistency is handled in distributed systems with things like ‘warranties’ and ‘promises’. A ‘warranty’ is an assertion by a subsystem that a certain value will not change before a specified time (Liu, et.al., 2014). A ‘promise’ is an assertion by a subsystem that the outcome of an operation is pending, but will become available at a future time (Mozilla, 2020). As the names of these terms imply, a distributed system must be capable of fulfilling warranties and promises; ethical questions arise when warranties and promises are, for whatever reason, unfulfilled.

Inconsistency can magnify ethical issues, especially in real-time analytics. “‘When the facts change, I change my mind’ can be a reasonable defence: but in order to avoid less defensible forms of inconsistency, changing your mind about one thing may require changing it about others also” (Boyd, 2019). Moral uncertainty, in other words, can have a cascading effect. Hence ethical analytics require mechanisms supporting warranties and promises.

Bias

The subject of bias in analytics is wide and deep. In one sense, it is merely a specific way analytics can be in error or unreliable. But more broadly, the problem of bias pervades analytics: it may be in the data, in the collection of the data, in the management of the data, in the analysis, and in the application of the analysis.

The outcome of bias is reflected in misrepresentation and prejudice. For example, “the AI system was more likely to associate European American names with pleasant words such as ‘gift’ or ‘happy’, while African American names were more commonly associated with unpleasant words.” (Devlin, 2017) “The tales of bias are legion: online ads that show men higher-paying jobs; delivery services that skip poor neighborhoods; facial recognition systems that fail people of color; recruitment tools that invisibly filter out women” (Powles and Nissenbaum, 2018).

One cause of bias lies in the data being used to train analytical engines. “Machine learning algorithms are picking up deeply ingrained race and gender prejudices concealed within the patterns of language use, scientists say.” (Devlin, 2017)

Another cause is inadequate data. For example, Feast (2019) writes of ‘omitted variable bias’, which “occurs when an algorithm lacks sufficient input information to make a truly informed prediction about someone, and learns instead to rely on available but inadequate proxy variables.” For example, “if a system was asked to predict a person’s future educational achievement, but lacked input information that captured their intelligence, studiousness, persistence, or access to supportive resources, it might learn to use their postal code as a proxy variable for these things. The results would be manifestly unfair to intelligent, studious, persistent people who happened to live in poorer areas” (Eckersley, et.al., 2017).

A third cause of bias is found in the use of labels in data collection and output. “ The vast majority of commercial AI systems use supervised machine learning, meaning that the training data is labeled in order to teach the model how to behave. More often than not, humans come up with these labels” (Feast, 2019).

It may be argued that we have always faced the problem of bias. “A problematic self-righteousness surrounds these reports: Through quantification, of course we see the world we already inhabit” (Powles and Nissenbaum, 2018). It is true that discrimination and prejudice have a long history. However, applying analytics to them exaggerates the problem. “AI is not only replicating existing patterns of bias, but also has the potential to significantly scale discrimination and to discriminate in unforeseen ways” (Fjeld, et.al., 2020:48).

Misinterpretation

Because analytical engines don’t actually know what they are watching, they may see one thing and interpret it as something else. For example, looking someone in the eyes is taken as a sign that they are paying attention. And so that’s how an AI interprets someone looking straight at it. The opacity of AI leads people to create a false impression, just in case the AI is watching. But it might just be the result of a student fooling the system. For example, students being interviewed by AI are told to “raise their laptop to be eye level with the camera so it appears they’re maintaining eye contact, even though there isn’t a human on the other side of the lens” (Metz, 2020). The result is that the AI misinterprets laptop placement as ‘paying attention’.

Misrepresentation

AI and analytics can be fraudulently used by pretending that they are able to do something they are not. One example in the literature is the Scientific Content Analysis, or SCAN, the creator of which says the tool can identify deception. However, a scientific review of the system found the opposite. As Pro Publica reports, “The review devoted just one paragraph to SCAN. Its synopsis was short but withering. SCAN ‘is widely employed in spite of a lack of supporting research,’ the review said” (Armstrong and Sheckler, 2019; Brandon, et.al., 2019).

This is arguably a widespread problem. “Unfortunately research scientists win prestigious EU funding and other source of income by telling myths about the potential of AI (and robots) that are in fact false. Huge EU funded projects are now promoting unfounded mythologies about the capabilities of AI. When these projects fail or do not deliver the results, the researchers as for more resources and often get the resources. This means that our resources get ploughed into mythical research avenues based on hyperbole rather than real human good” (Richardson and Mahnič, 2017).

Distortion

People can be gradually led into supporting more and more extreme views and this is a well-known effect of some recommendation engines. And it’s a well-known phenomenon whereby people who have taken a position on an issue will, when questioned, entrench their views and interpret evidence in such a way as to favour that position. (Mercier & Sperber, 2017, p. 121)

An oft-made critique of algorithms like the YouTube recommendation system is that it tends to lead from relatively benign content to increasingly negative, polarizing or radical content. “It seems as if you are never ‘hard core’ enough for YouTube’s recommendation algorithm. It promotes, recommends and disseminates videos in a manner that appears to constantly up the stakes.” As an advertising engine, YouTube needs to maximize views, which leads to increasingly radical content. “Negative and novel information grab[s] our attention as human beings and [] cause[s] us to want to share that information with others—we’re attentive to novel threats and especially attentive to negative threats.” (Chesney and Citron, 2018:1753; Meyer, 2018)

Radicalization seems clearly to be undesirable and to pose an ethical problem for recommendation algorithms. But what if we misled them about the position they actually took in a more positive way? In a recent study, “By making people believe that they wrote down different responses moments earlier, we were able to make them endorse and express less polarized political views” (Strandberg, et.al., 2020). That sounds great, but is it ethical?

Bad Pedagogy

There is a risk, writes Ilkka Tuomi (2018), “that AI might be used to scale up bad pedagogical practices. If AI is the new electricity, it will have a broad impact in society, economy, and education, but it needs to be treated with care.” For example, badly constructed analytics may lead to evaluation errors. “Evaluation can be ineffective and even harmful if naively done ‘by rule’ rather than ‘by thought’” (Dringus, 2012).

Even more concerning is how poorly designed analytics could result in poorly defined pedagogy. Citing Bowker and Star (1999), Buckingham Shum and Deakin Crick (2012) argue that “a marker of the health of the learning analytics field will be the quality of debate around what the technology renders visible and leaves invisible, and the pedagogical implications of design decisions.” In particular, they focus on “the challenge of designing learning analytics that render visible learning dispositions and the transferable competencies associated with skillful learning in diverse contexts.”

Irrelevance

For all the time and resources invested, it may be that analytics have no impact on learning outcomes. For example, Tuomi (2018) argues that while “IBM’s Watson Classroom promises cognitive solutions that help educators gain insights into the learning styles, preferences, and aptitudes of each student, ‘bringing personalized learning to a whole new level’ … it is, however, not obvious that such objectives would be beneficial or relevant for learning.” Analytics can be irrelevant if the data collected has no bearing on whether an individual learned or now. “For example, in the discussion forum, last access data that shows current or inactive participation is insufficient in revealing the true status of a student’s progress” (Dringus, 2012).

Bad Actors

Bad actors are people or organizations that attempt to subvert analytics systems. They may be acting for their own benefit or to the detriment of the analytics organizations or their sponsors. The prototypical bad actor is the hacker, a person who uses software and infiltration techniques to intrude into computer systems. Bad actors create ethical issues for analytics because they demonstrate the potential to leverage these systems to cause harm.

Conspiracy Theorists

A conspiracy theorist is a person or group who promotes an alternative narrative alleging a coordinated campaign of disinformation, usually on the part of recognized authorities or institutions. Conspiracy theorists often replicate analytical methods and dissemination, and sometimes subject existing analytics for their own purposes.

During the recent Covid-19 pandemic conspiracy theories abounded. One such was a campaign organized around the #FilmYourHospital hashtag, alleging that because hospital parking lots were empty, assertions that there was a pandemic in progress must be fake (Gruzd, 2020).

Stalkers

Collusion

We recognize collusion as the behaviour of bad actors. Members of a price cartel, for example, operate in concert to artificially inflate prices. Or scientific authors operating in conspiracy may collude to give each other favourable reviews, even if their work would normally be rejected.

Analytics engines working in concert can become bad actors in their own right. For example, Calvano (et.al., 2020) showed that “algorithms powered by Artificial Intelligence (Q-learning) in a workhorse oligopoly model of repeated price competition…. consistently learn to charge supracompetitive prices, without communicating with one another. The high prices are sustained by collusive strategies with a finite phase of punishment followed by a gradual return to cooperation. This finding is robust to asymmetries in cost or demand, changes in the number of players, and various forms of uncertainty.”

When Analytics is Fundamentally Dubious

Narayan (2019) describes the following “fundamentally dubious” uses of learning analytics: predicting criminal recidivism, policing, terrorist risk, at-risk kids, and predicting job performance. “These are all about predicting social outcomes,” he says, “so AI is especially ill-suited for this.” There are good examples of cases where analytics fail in such cases; Narayan cites a study by that shows “commercial software that is widely used to predict recidivism is no more accurate or fair than the predictions of people with little to no criminal justice expertise” (Dressel and Farid, 2018).

It is arguable that the ethical issue with such employments of analytics is not that they will be inaccurate, but rather, that analytics shouldn’t be used in this way for any number of reasons. The complexity surrounding social outcomes is one factor, but so is the impact on individual lives from that decisions about (say) recidivism of future job performance. Even if analytics gets it right, there is an argument to be made that it should not be applied in such cases or applied in this way.

Predicting Criminals

A news report in 2020 revealed that a county police department in Florida uses data from the local school district to keep a “a secret list of kids it thinks could ‘fall into a life of crime’ based on factors like whether they’ve been abused or gotten a D or an F in school.” The story reports, “In its intelligence manual, the Pasco Sheriff’s Office says most police departments have no way of knowing if kids have ‘low intelligence’ or come from ‘broken homes’ — factors that can predict whether they’ll break the law. ‘Fortunately,’ it continues, ‘these records are available to us.'” (Bedi & McGrory, 2020)

The Intelligence-Led Policing Manual states, “the system takes into account a student’s grades, attendance, and behavior. Through DCF’s Florida Safe Families Network (FSFN), we are able to identify juveniles who have had adverse childhood experiences(ACEs)… Last, our records management system can identify predictors of criminal behavior such as arrests at an early age, arrests for certain offenses, frequently running away, and a juvenile’s social network.We combine the results of these three systems to identify those juveniles who are most at-risk to fall into a life of crime.” (Pasco Sheriff’s Office, 2018, p. 13)

Critics argue “the existence of the list may represent an illegal use of student data. Regardless of legality, they say, the list puts students and families at risk of being unfairly targeted before they’ve done anything wrong.” They argue that “If a student’s presence on this list ends up in their records long-term, it could affect their ability to be admitted to college or hired by employers” As well, “Police officers could also inadvertently use their perceptions of students who appear on the list to make decisions about how to adjudicate a crime that takes place” (Lieberman, 2020)

Racial Profiling

It is arguable that there is no ethical application of analytics to identify specific races for special treatment. As one commenter on a BoingBoing article suggested, “Imagine a billboard that alternated between advertising Cabernet Sauvignon or Malt Liquor depending on the skin tone of the person looking at it.” The article itself described an ‘ethnicity-detection camera’ that could be used to identify Uyghurs for special treatment (Beschizza, 2019).

Analytics can erroneously attribute to race outcomes that have other causes. For example, an analytics engine may find “a city’s crime data reflect the historical policing and surveillance in minority and low-income communities.” In fact, however, the outcome may reflect the racism of police officers rather than the race of the citizens. In one case, “Researchers looked at over 10 years of Charlotte’s data to find patterns of abuses. They found that the most significant predictor of inappropriate interactions were the officers themselves” (Arthur, 2016; Ekowo and Palmer., 2016). A similar study by ProPublica reached similar conclusions (Angwin, et.al., 2016). “Blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend. It makes the opposite mistake among whites.”

It is also arguable that using race as a label is itself ethically questionable, first, because it fosters and promotes racial discrimination, second, because the categories defined by race are not significant (for example, there is no principled distinction between ‘black’ and ‘white’, especially in a population that may have elements of both), and third, because the usual categories of race (white, black, hispanic, Asian) reflect a colonial perspective.

Identity Graphs

Data ostensibly collected for one purpose may be used to create comprehensive user profiles. Todd Carpenter (2020, October 13) writes of a service “being used by Elsevier is called ThreatMetrix (that) is owned and provided by the LexisNexis arm of the RELX holding company, which also owns Elsevier.” He writes, “It is not that LexisNexis is specifically tracking the download of this paper or that on the Elsevier system, but that it is using these behavioral data, in particular its identification of people and their devices, to build a profile of an ever larger segment of the population to track those citizens.”

Created by such services, ‘identity graphs’ are “tools that online advertisers use to link together your behavior across your phone, laptop, work laptop, Xbox and work Xbox” and are widely used in the advertising industry (Heaton, 2017). While “there is probably a strict set of self-regulatory non-binding guidelines discouraging them from abusing it,” identity graphs are nonetheless ripe for abuse. These profiles could in turn be used for a variety of reasons: for access control, for service provision, for hiring and employment decisions, for marketing, and to trigger further investigation.

Hamel (2016) asks “Is it legal and ethical for 3rd parties to build consumer profiles from your social and online presence, merge it with their own internal data, credit scores and any other data sources they can find, and potentially sell back the enriched data to avid marketers?” It does appear to be legal. “Popular services like SalesForce, Marketo, Nimble, HubSpot, Rapportive and tons of others in the mar-tech space allows marketers to learn amazing details about individual people and tweak their marketing message.”

The issue, he writes, is that “this is generally done without your knowledge, without your consent and without the ability to review the collected data.” And the result is an outcome that at a minimum places the individual at a relative disadvantage vis-à-vis the advertiser, and at worst, places them in danger. Though the identity graph may be anonymized, marketers have long since been able to ‘de-anonymize’ the data (Ohm, 2010).

“Professor Peter McOwan… told us that AI systems have become better at automatically combining separate datasets, and can piece together much more information about us than we might realise we have provided. He cited as an example ‘pleaserobme.com’, a short-lived demonstration website, which showed how publicly accessible social media data could be combined to automatically highlight home addresses of people who were on holiday.” (Clement-Jones, et.al., 2018).

Autonomous Weapons

It seems commonsense to place the class of ‘autonomous weapons’ into the category of ‘fundamentally dubious’ applications of analytics. Even the U.S. Department of Defense agrees that there should be a “human-in-the-loop” approach to automated weapons systems. It also proposes policies to ensure that the weapons function as designed, and that force is applied only following approved rules of engagement (DoD, 2012). It should go without saying that the use of autonomous weapons raises a range of ethical issues.

The application of autonomous weapons may appear to have limited scope in learning analytics, however, even the DoD Directive allows for a class of “autonomous or semi-autonomous cyberspace systems for cyberspace operations” (Ibid) and these are not governed by the Directive cited above. And it is not unreasonable to speculate about the deployment of cybernetic systems of enforcement, punishment, or deterrent in AI-enabled learning management systems, for example, to prohibit cheating, enforce copyright regulations, or regulate unauthorized access to learning materials. Such systems could include, but are not limited to, the automated public disclosure of information or retaliatory measures such as malware or viruses.

When We Don’t Know What the Consequences Will Be

It’s true that for any application of any technology there may be unintended consequences, however, it is arguable that there are cases where there is a knowable risk of unintended consequences and where these consequences may be particularly harmful.

A good example of this, perhaps, is the suggestion that colleges should put smart speakers in student dormitories. As Miles (2019) reports, “When it comes to deploying listening devices where sensitive conversations occur, we simply have no idea what long-term effect having conversations recorded and kept by Amazon might have on their futures—even, quite possibly, on their health and well-being.”

Instead, she continues, quoting Emerson professor Russell Newman, “We need a safe way to experiment with these technologies and understand the consequences of their use instead of just continuing a blind march towards surveillance for the purpose of profit-making. These are sophisticated applications with lifelong consequences for the individuals who are analyzed by them, to ends as yet unknown. We all need to be really judicious and thoughtful here.”

Social and Cultural Issues

This is a class of issues that addresses the social and cultural infrastructure that builds up around analytics. These are not issues with analytics itself, but with the way analytics changes our society, our culture, and the way we learn.

Opacity and Transparency

Analytics is ethically problematic in society when it is not transparent. When a decision-making system is opaque, it is not possible to evaluate whether it is making the right decision. We desire transparency in analytics. “The principle of ‘transparency’ is the assertion that AI systems should be designed and implemented in such a way that oversight of their operations are possible” (Fjeld, et.al., 2020:42).

Proponents argue, in particular, that people should be aware of when analytics is employed in a decision-making capacity. For example, “the Electronic Frontier Foundation told us of the need for transparency regarding the use of AI for dynamic or variable pricing systems, which allow businesses to vary their prices in real time.” (Eckersley, et.al., 2017).

This is expressed as the ‘principle of notification’. As the Select Committee on Artificial Intelligence writes, “The definition of the principle of ‘notification when an AI system makes a decision about an individual’ is facially fairly clear: where an AI has been employed, the person to whom it was subject should know. The AI in UK document stresses the importance of this principle to allow individuals to “experience the advantages of AI, as well as to opt out of using such products should they have concerns.” (Fjeld, et.al., 2020:45)

Additionally, transparency applies to the model or algorithm applied in analytics. “Transparency of models: it relates to the documentation of the AI processing chain, including the technical principles of the model, and the description of the data used for the conception of the model. This also encompasses elements that provide a good understanding of the model, and related to the interpretability and explainability of models” (Hamon, Junklewitz & Sanchez, 2020, p.2).

That is why the the Montreal Declaration describes the use of open source software and open data sets as a “socially equitable objective” (University of Montreal, 2018). Additionally, the ICCPR Human Rights Committee states that “every individual should have the right to ascertain in an intelligible form, whether, and if so, what personal data is stored in automatic data files, and for what purposes.” (UC Berkeley, 2019)

Alienation

Artificial intelligence and analytics impose themselves as a barrier between one person and another, or between one person and necessary access to jobs, services, and other social, economic and cultural needs. Consider the case of a person applying for work where analytics-enabled job applicant screening is being used. However, “La difficulté, pour les candidats pris dans les rets de ces systèmes de tris automatisés, est d’en sortir, c’est-à-dire se battre contre les bots, ces gardiens algorithmiques, pour atteindre une personne réelle capable de décider de son sort (The difficulty for candidates caught in the nets of these automated sorting systems is to get out of them, that is, to fight against bots, those algorithmic guardians, to reach a real person capable of deciding on their exit)” (Guillaud, 2020).

The process can be depersonalizing and demeaning. For example, Jeffrey Johnson describes recent experiences with such systems. “He got emails prompting him to take an online test seconds after he submitted an application, a sure sign no human had reviewed his résumé. Some were repeats of tests he’d already taken. He found them demeaning. ‘You’re kind of being a jackass by making me prove, repeatedly, that I can type when I have two writing-heavy advanced degrees,’ Johnson said, ‘and you are not willing to even have someone at your firm look at my résumé to see that’” (Keppler, 2020).

The eventual consequence may be disengagement and alienation. “Will Hayter, Project Director of the Competition and Markets Authority, agreed: ‘ … the pessimistic scenario is that the technology makes things difficult to navigate and makes the market more opaque, and perhaps consumers lose trust and disengage from markets’” (Clement-Jones, et.al, 2018:para 52).

Explainability

Explainability is closely related to transparency. “Explainability is particularly important for systems that might ‘cause harm,’ have ‘a significant effect on individuals,’ or impact ‘a person’s life, quality of life, or reputation.’… if an AI system has a “substantial impact on an individual’s life” and cannot provide ‘full and satisfactory explanation’ for its decisions, then the system should not be deployed.” (Fjeld, et.al., 2020:43)

In the case of analytics, explainability seems to be inherently difficult. Zeide (2019) writes, “Unpacking what is occurring within AI systems is very difficult because they are dealing with so many variables at such a complex level. The whole point is to have computers do things that are not possible for human cognition. So trying to break that down ends up creating very crude explanations of what is happening and why.”

This is unsatisfactory. That is why, for example, the EU has added a “right to explanation” within the GDPR. “The Article 29 Working Party [28] state that human intervention implies that the human-in-the-loop should refer to someone with the appropriate authority and capability to change the decision,” write (Hamon, Junklewitz & Sanchez (2020, p.8). “It is clear how the requirement of explainability is relevant for the envisaged safeguards. Human supervision can only be effective if the person reviewing the process can be in a position to assess the algorithmic processing carried out.”

But GDPR implementation should be viewed as a regulatory experiment (Eckersley, et.al., 2017). “That might mean adopting clearer and stronger incentives for explainability, if the GDPR rules appear to be bearing fruit in terms of high-quality explanatory technologies, or it might mean moving in the direction of different types of rules for explainability, if that technical research program appears unsuccessful.”

But we’re not sure whether we’ll be able to provide explanations. As Eckersley, et.al. (2017) say, “Providing good explanations of what machine learning systems are doing is an open research question; in cases where those systems are complex neural networks, we don’t yet know what the trade-offs between accurate prediction and accurate explanation of predictions will look like.”

Accountability

Numerous agencies have announced efforts to ensure that automated decisions are ‘accountable.’ As Rieke, Bogen & Robinson (2018) write, “Advocates, policymakers, and technologists have begun demanding that these automated decisions be explained, justified, and audited. There is a growing desire to “open the black box” of complex algorithms and hold the institutions using them accountable.”

The need for accountability comes in contrast to the idea that analytics and artificial intelligence systems may be autonomous. We trust autonomous systems, and may do so reasonably, even if they sometimes fail. The question of accountability thus becomes one of whether we over-trust or under-trust such systems; “The unfortunate accidents caused by autonomous vehicles can be seen as cases of over-trust: in each case the human driver falsely believed that the automated system in control of the driving was capable of performing at a level at which it was not capable of. Thus, our aim could be to encourage appropriate levels of trust in AI, with accountability regimes taking the nuances of over and under trust into account.” (Millar, et.al., 2018; see also IEEE 2016)

But the nature of AI might make accountability impossible. “Suppose every single mortgage applicant of a given race is denied their loan, but the Machine Learning engine driving that decision is structured in such a way that the relevant engineers know exactly which features are driving such classifications. Further suppose that none of these are race-related. What is the company to do at this point?” (Danzig, 2020).

Social Cohesion and Filter Bubbles

The UK House of Lords Select Committee notes that “The use of sophisticated data analytics for increasingly targeted political campaigns has attracted considerable attention in recent years, and a number of our witnesses were particularly concerned about the possible use of AI for turbo-charging this approach” (Clement-Jones, et.al, 2018:para 260). One example is the use of bot Twitter accounts to sow division during the Covid-19 pandemic. “More than 100 types of inaccurate COVID-19 stories have been identified, such as those about potential cures. But bots are also dominating conversations about ending stay-at-home orders and ‘reopening America,’ according to a report from Carnegie Mellon (Young, 2020).

In a digital environment we are deluged with information with no real way to make sense of it all, creating what Benkler (2006) calls the Babel objection: “individuals must have access to some mechanism that sifts through the universe of information, knowledge, and cultural moves in order to whittle them down into manageable and usable scope.” Using data from our previous reading or viewing behaviour, data analytics identifies patterns that we do not detect and we do not know about before they are mined (Ekbia, et.al., 2014;Chakrabarti, 2009) and feeds these back to us through recommender systems

This creates a cycle that augments and reinforces these patterns, putting people in “filter bubbles” (Pariser, 2012) whereby over time they see only content from a point of view consistent with their own. For example, “In a study of Facebook users, researchers found that individuals reading fact-checking articles had not originally consumed the fake news at issue, and those who consumed fake news in the first place almost never read a fact-check that might debunk it.” (Chesney and Citron, 2018:1768)

An ethical issue here arises because “information is filtered before reaching the user, and this occurs silently. The criteria on which filtering occurs are unknown; the personalization algorithms are not transparent” (Bozdag & Timmermans, 2011). Additionally, “We have different identities, depending on the context, which is ignored by the current personalization algorithms” (Ibid). Moreover, algorithms that drive filter bubbles may be influenced by ideological or commercial considerations (Introna & Nissenbaum, 2000:177).

Feedback Effects

The phenomenon of the self-fulfilling prophecy is well known. It is essentially the idea that the prediction of an event makes the event more likely to occur. This is the result of a feedback loop, where the nature of the prediction becomes known or evident to people or circumstances that might cause or prevent the effect in the first place.

A good example of this is the use of polls in elections. If the polls predict a certain outcome, this may have the result of influencing people to stay home, thus ensuring the outcome that had been predicted (Ansolabehere and Iyengar, 1994). This becomes an issue for analytics when polls, trends and other factors are used to project outcomes; “these forecasts aggregate polling data into a concise probability of winning, providing far more conclusive information about the state of a race” (Westwood, et.al., 2020).

At a certain point, these models can be deployed to change behaviour. For example, during the Covid-19 outbreak, projections were employed to encourage the population to follow physical distancing protocols and to influence political decisions. As epidemiologist Ashleigh Tuite says, “The point of a model like this is not to try to predict the future but to help people understand why we may need to change our behaviors or restrict our movements, and also to give people a sense of the sort of effect these changes can have” (Kristof & Thompson, 2020). But as Diakopoulos (2020) argues, “As predictions grow into and beyond their journalistic roots in elections, transparency, uncertainty communication, and careful consideration of the social dynamics of predictive information will be essential to their ethical use.”

Inclusion

There are ethical issues around the question of inclusion and exclusion in analytics. Most often, these are put in the form of concerns about biased algorithms. But arguably, the question of inclusion in analytics ought to be posed more broadly. For example, Emily Ackerman (2019) reports of having been in a wheelchair and blocked from existing an intersection by a delivery robot waiting on the ramp. This isn’t algorithmic bias per se but clearly the use of the robot excluded Ackerman from an equal use of the sidewalk.

New types of artificial intelligence lead to new types of interaction. In such cases, it is of particular importance to look at the impact on traditionally disadvantaged groups. “There is increasing recognition that harnessing technologies such as AI to address problems identified by working with a minority group is an important means to create mainstream innovations. Rather than considering these outcomes as incidental, we can argue that inclusive research and innovation should be the norm” (Coughlan, et.al., 2019a: 88).

Consent

What is consent? “Broadly, ‘consent’ principles reference the notion that a person’s data should not be used without their knowledge and permission.” (Fjeld, et.al., 2020) Related to ‘consent’ is ‘informed consent’, “which requires individuals be informed of risks, benefits, and alternatives.” (Ibid). Consent is viewed as foundational for other rights, including not only the right to refuse access to one’s data, but also the right to correct that data, erase that data, or control the use of that data.

The principle of consent has been violated in some well-known cases. In one such, researchers manipulated Facebook users’ emotions by covertly varying their news feeds (Kramer, et.al., 2014). Google revealed its ‘Project Nightingale’ after being accused of secretly gathering personal health records (Griggs, 2019); Google also offers a ‘Classroom’ application and questions have been raised about its data collection practices on that platform (Singer, 2017). In yet another case, a Georgia Tech professor built a robot tutor that students believed was a human (Eicher, Polepeddi & Goel, 2018). Each of these touches the issue of consent in different ways, varying from malicious uses of technology to benign and even helpful.

Numerous ethical statements enshrine a principle of consent, as we shall see below. However, the refusal to consent to the use of analytics might also be unethical. For example, when an AI scores higher than humans on a test on diagnostics used to certify physicians (Bresnick, 2018), the refusal to use analytics may be indefensible. Similarly, by 2053 “surgical jobs could be the exclusive purview of AI tools” (Bresnick, 2017). This again makes withholding consent questionable.

There is precedent. The refusal of vaccines today may be viewed as unethical and even illegal (Horan and DePetro, 2019). It may be unethical, because “those who refuse vaccination yet benefit from herd immunity can be considered free-riders who are acting against the principle of fairness and, therefore, acting unethically.” And personal autonomy is not absolute. “Autonomy is just one value and it is not the only one that should be considered. Nor does it necessarily outweigh every other important value.”

Consent applies not only to the recipient of services, it also applies to the provider. We see this in other disciplines. For example, health care providers, for reasons of conscience, may wish to refuse to perform abortions, or to offer blood transfusions (Kemp, 2013). In some cases this may be acceptable, but in other cases it may seem obviously wrong, as for example should a person withhold consent because the recipient is an ethic minority or a member of the LGBTQ community (McMurtree, 2000). Would it be unethical for an educator to refuse to use analytics? Would it be unethical for an educator to use the results of analytics to refuse to offer services?

Surveillance Culture

Above, we discussed the ethics of surveillance itself. Here, we address the wider question of the surveillance culture. This refers not only to specific technologies, but the creation of a new social reality. “Focusing on one particular identification method misconstrues the nature of the surveillance society we’re in the process of building. Ubiquitous mass surveillance is increasingly the norm” (Schneier, 2020). Whether in China, where the infrastructure is being built by the government, or the west, where it’s being built by corporations, the outcome is the same.

Surveillance becomes data which then becomes a mechanism that disadvantages the surveilled. Insurance companies use surveillance data to adjust rates, penalizing those they feel are a higher risk. (Davenport & Harris, 2007;Manulife, 2020; Allstate, 2020) Cafe chains are using facial recognition to bill customers. (Sullivan & Suri, 2019). “A man tries to avoid the cameras, covering his face by pulling up his fleece. He is stopped by the police and forced to have his photo taken. He is then fined £90 for ‘disorderly behaviour’” (Lyon, 2017). “The Republican National Committee and the Trump campaign have reportedly compiled an average of 3,000 data points on every voter in America,” enabling it, arguably, “to wage an untraceable whisper campaign by text message” (Coppins, 2020).

What we are finding with surveillance culture is the ‘elasticity’ of analytics ethics (Hamel, 2016) as each step of surveillance stretches what we are willing to accept a bit and makes the next step more inevitable. The uses of streetlight surveillance are allowed to grow (Marx, 2020). Surveillance becomes so pervasive it becomes impossible to escape its reach. (Malik, 2019). And nowhere is this more true than in schools and learning. The goal is “to connect assessment, enrollment, gradebook, professional learning and special education data services to its flagship student information system” (Wan, 2019). Or, as Peter Greene (2019) says, “PowerSchool is working on micromanagement and data mining in order to make things easier for the bosses. Big brother just keeps getting bigger, but mostly what that does is make a world in which the people who actually do the work just look smaller and smaller.”

Audrey Watters captures the issue of surveillance culture quite well. It’s not just that we are being watched, it’s that everything we do is being turned into data for someone else’s use – often against us. She says “These products — plagiarism detection, automated essay grading, and writing assistance software — are built using algorithms that are in turn built on students’ work (and often too the writing we all stick up somewhere on the Internet). It is taken without our consent. Scholarship — both the content and the structure — is reduced to data, to a raw material used to produce a product sold back to the very institutions where scholars teach and learn.” (Watters, 2019).

She continues (Ibid), “In her book The Age of Surveillance Capitalism, Shoshana Zuboff calls this ‘rendition,’ the dispossession of human thoughts, emotions, and experiences by software companies, the reduction of the complexities and richness of human life to data, and the use of this data to build algorithms that shape and predict human behavior.”

Power and Control

People often respond to concerns about surveillance by saying “I don’t mind because I have nothing to hide.” However, a cynical response to this might be, “you don’t mind what the consequences of surveillance are, so long as they happen to other people.” As Gellman and Adler-Bell (2017) write, “Universalist arguments obscure the topography of power. Surveillance is not at all the same thing at higher and lower elevations on the contour map of privilege.”

This is an argument frequently made by Edward Snowden. In a video made with Jean-Michel Jarre (2016) he says, “Saying that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about freedom of speech because you have nothing to say.” And as he posted on Twitter (2015), “Ask yourself: at every point in history, who suffers the most from unjustified surveillance? It is not the privileged, but the vulnerable. Surveillance is not about safety, it’s about power. It’s about control.” Desmond Cole makes a similar point (Neal, 2020), noting that armed police are never brought into rich white private schools, even though drug dealing and everything else may happen there, only the poor schools with minority populations.

The use of surveillance, analytics and artificial intelligence to exercise discretionary control is not a hypothetical. The University of Toronto’s digital surveillance and human-rights watchdog, Citizen Lab, reports on spy software such as Pegasus (Marczak, 2018) and has publicly identified companies such as NSO Group as “bad actors” in world affairs. It reports having been targeted itself by companies using surveillance to exercise control (Farr, 2020).

An Oppressive Economy

On the one hand, everything we do – our work, our data, or thoughts and emotions – is grist for the analytics engine. As Wattres writes (Watters, 2019), “In her book The Age of Surveillance Capitalism, Shoshana Zuboff calls this ‘rendition,’ the dispossession of human thoughts, emotions, and experiences by software companies, the reduction of the complexities and richness of human life to data, and the use of this data to build algorithms that shape and predict human behavior.”

The products that depend on analytics engines — plagiarism detection, automated essay grading, and writing assistance software — are built using algorithms that are in turn built on students’ work. And this work is often taken without consent, or (as the lawsuit affirming TurnItIn’s right to use student essays) consent demanded as an educational requirement (Masnick, 2008).

Second, “Scholarship — both the content and the structure — is reduced to data, to a raw material used to produce a product sold back to the very institutions where scholars teach and learn.” (Watters, 2019) And in a wider sense, everything is reduced to data, and the value of everything becomes the value of that data. People no longer simply create videos, they are “influencers”. Courses are no longer locations for discussion and learning, they produce “outcomes”.

Watters argues that teachers and administrators should not uncritically advocate data-driven products, but it is arguable that the wider danger to our culture and society is wider than whether or not this or that technology is good. How does one advocate, or not advocate, an entire economy?

Loss of Sense of Right and Wrong

There is the sense that analytics and AI can not reason, cannot understand, and therefore cannot know the weight of their decisions. This, somehow, must be determined. But as Brown (2017) asks, “Who gets to decide what is the right or wrong behaviour for a machine? What would AI with a conscience look like?” On the other hand, perhaps AI can learn the difference between right and wrong for itself. Ambarish Mitra (2018) asks, “What if we could collect data on what each and every person thinks is the right thing to do? … With enough inputs, we could utilize AI to analyze these massive data sets—a monumental, if not Herculean, task—and drive ourselves toward a better system of morality… We can train AI to identify good and evil, and then use it to teach us morality.”

The danger in this is that people may lose the sense of right and wrong, and there are suggestions that this is already happening. Graham Brown-Martin argues, for example, “At the moment within social media platforms we are seeing the results of not having ethics, which is potentially very damaging. You are talking about a question for society to answer in the public domain about what our ethics are. Just because we can do something does not mean that we should do it, and I think we are on that cusp” (Clement-Jones, et.al, 2018:para 247).

Do right and wrong become what the machine allows it to be? This is perhaps the intuition being captured by people who are concerned that AI results in a loss of humanity. And when we depend on analytics to decide on right and wrong, what does that do to our sense of morality? Perhaps, as Pinker (2008) suggests, we are genetically predisposed to have a moral sense. “Morality is not just any old topic in psychology but close to our conception of the meaning of life. Moral goodness is what gives each of us the sense that we are worthy human beings.” What happens if we lose this?

Ownership

A number of issues arise with respect to ownership, analytics and artificial intelligence. One such issue relates to IP protection and analytics: should AI algorithms be patented? Should specific applications, such as “the classification of digital images, videos, audio or speech signals based on low-level features (e.g.edges or pixel attributes for images),” be protected (Iglesias, et.al.,2019:7)?

Additionally, “curated data libraries may or may not deserve IP protection on their own” (Ibid:9). As Kay (2012) notes, “Although raw activity data is unlikely to attract copyright, a type of IPR, collations of such data may attract database rights, another type of IPR, which can restrict uses of substantial amounts of this data; enhanced activity data may itself be in copyright and may well enjoy database rights as well.“ Moreover, an additional special case of ownership applies to data scraped from websites, as discussed above (Das, 2020)?

We can think of ethical responsibility for analytics and artificial intelligence under two headings: who gets the credit, and who takes the blame? The answers to each are not obvious. With respect to the first question: Who are the creators of AI-generated art — programmers or machines? (Canellis, 2019). We may want to say that it’s the programmers. That’s the precedent that was set when a court ruled that the human owner of a camera, not the monkey that took the photo, owned the copyright (Cullinane, 2018).

But what would we say about this: Damien Riehl and Noah Rubin “designed and wrote a program to record every possible 8-note, 12-beat melody and released the results — all 68+ billion melodies, 2.6 terabytes of data — into the public domain” (Kottke, 2020). What would we say if Disney had done it and copyrighted the melodies? On the other hand, if we do not grant copyright to AI-generated work, then how do we identify them? Michaux (2018) argues that it may be difficult to distinguish works generated by humans and by machines.

After analytics and AI master the art of creation, what role does that leave for humans? “Could humans essentially be blocked out of content creation by the pace of AI text generation and the resulting claims of copyright for every possible meaningful text combination? With the expansion of tools, matched with the increasing speed of processing and available storage, such a world isn’t beyond comprehension” (Carpenter, 2020).

As a recent literature review cautions, “Before favouring one solution or another, further economic and legal research is needed to assess to what extent the creation of new rights is needed at all. Who is/will be producing AI-generated goods? How autonomous are inventive/creative machines? What impact might regulation have on the relevant stakeholders, including artistic and cultural workers? What are the consequences of protection or non-protection?” (Iglesias, et.al., 2019).

Responsibility

With respect to the second, while it may be intuitive to argue that human designers and owners ought to take responsibility for the actions of an AI, arguments have been advanced suggesting that autonomous agents are responsible in their own right, thereby possibly absolving humans of blame. “While other forms of automation and algorithmic decision making have existed for some time, emerging AI technologies can place further distance between the result of an action and the actor who caused it, raising questions about who should be held liable and under what circumstances. These principles call for reliable resolutions to those questions.” (Fjeld, et.al., 2020:34)

The argument from AI autonomy has a variety of forms. In one, advanced (tentatively) by the IEEE. It draws the distinction between ‘moral agents’ and ‘moral patients’ (or ‘moral subjects’) to suggest that we ought to distinguish between how an outcome occurred, and the consequence of that outcome, and suggests that autonomous self-organizing systems may operate independently of the intent of the designer (IEEE, 2016, p. 196) As Bostrom and Yubkowsky (2029) write, “The local, specific behavior of the AI may not be predictable apart from its safety,even if the programmers do everything right.” It may seem unjust to hold designers responsible in such cases.

There is the expectation that systems of the future will be able to sense their environment, plan based on that environment, and act upon that environment with the intent of reaching some task-specific goal (either given to or created by the system) without external control. Beer, et.al. (2014) take this description to define autonomy. “Autonomy is a critical construct related to human-robot interaction (HRI) and varies widely across robot platforms. Levels of robot autonomy (LORA), ranging from teleoperation to fully autonomous systems, influence the way in which humans and robots interact with one another.”

Winner Takes All

It is arguable that the application of analytics and AI will give some individuals and companies an unbeatable advantage in the marketplace. This concern has been raided by the Electronic Frontier Foundation (Eckersley, et.al., 2017)). They ask, “How can the data-based monopolies of some large corporations, and the ‘winner-takes-all’ economies associated with them, be addressed? How can data be managed and safeguarded to ensure it contributes to the public good and a well-functioning economy?”

Similarly, the advantage offered by analytics and AI may require that special measures be taken to ensure equitable access to the benefits, and may need to be established as a principle. “The ‘access to technology’ principle represents statements that the broad availability of AI technology, and the benefits thereof, is a vital element of ethical and rights-respecting AI.” (Fjeld, et.al., 2020:61) This is one of the top ethical issues of AI identified by the World Economic Forum. “Individuals who have ownership in AI-driven companies will make all the money…. If we’re truly imagining a post-work society, how do we structure a fair post-labour economy?” (Bossman, 2016).

Environmental Impact

Some technologies, such as blockchain, are already known to have a potentially significant impact on the environment (Hotchkiss, 2019). Analytics and AI could have a similarly detrimental impact as “manufacture of digital devices and other electronics — which go hand-in-hand with development of AI — continues to damage our environment, despite efforts to prevent this” (Meinecke. 2018).

In other cases, no real concern is identified. “While the authors are sensitive to the significant impact AI is having, and will have, on the environment, we did not find a concentration of related concepts in this area that would rise to the level of a theme” (Fjeld, et.al., 2020: 16). Yet, “IA Latam’s principles, for example, stress that the impact of AI systems should not ‘represent a threat for our environment.’ (and) other documents go further, moving from a prohibition on negative ramifications to prescribe that AI technologies must be designed ‘to protect the environment, the climate and natural resources’ or to ‘promote the sustainable development of nature and society.’” (Fjeld, et.al., 2020:31)

Safety

Is AI safe? That may seem like an odd question, but the issue comes to the fore in the case of automated vehicles, and as analytics and AI are used in more and more systems – everything from construction to mechanics to avionics – the question of safety becomes relevant.

Beyond the question of whether we can trust AI is the broader question of whether producers of AI and analytics-based systems are actually concerned about safety. For example, in the U.S. the National Transportation Safety Board (NTSB) said that Uber’s “inadequate safety culture” contributed to a fatal collision between an Uber automated test vehicle and a pedestrian…. the vehicle’s factory-installed forward collision warning and automatic emergency braking systems were deactivated during the operation of the automated system” (NTSB, 2019).

In general, there is a concern about the technology industry’s disregard for the potential impact and consequences of their work. The impact on safety could be direct, as in the Uber case, or indirect, as in the case of misleading content (Metz and Blumenthal, 2019) that could, say, lead people into dangerous patterns of behaviour, such as failing to vaccinate (Hoffman, 2019), or violent behaviour, such as vigilante attacks on innocent civilians in India (McLaughlin, 2018).

Analytics can be hacked in ways that are difficult to detect. For example, “engineers were able to fool the world’s top image classification systems into labeling the animal as a gibbon with 99% certainty, despite the fact that these alterations are utterly indiscernible to the human eye…The same technique was later used to fool the neural networks that guide autonomous vehicles into misclassifying a stop sign as a merge sign, also with high certainty” (Danzig, 2020). Even more concerning, in 2018 researchers used a 3D printer “to create a turtle that, regardless of the angle from which it was viewed by leading object recognition systems, was always classified as a rifle” (Ibid.).

The Scope of Ethics in Analytics

In the work above we’ve identified some areas that lie outside most traditional accounts of analytics and ethics. We found we needed to widen the taxonomy of learning analytics to include deontic analytics, in which our systems determine what ought to be done. And we have to extend our description of ethical issues in analytics to include social and cultural issues, which speak to how analytics are used and the impact they have on society.

And it is precisely in these wider accounts of analytics that our relatively narrow statements of ethical principles are lacking. It is possible to apply analytics correctly and yet still reach a conclusion that would violate our moral sense. And it is possible to use analytics correctly and still do social and cultural harm. An understanding of ethics and analytics may begin with ethical principles, but it is far from ended there.

There are some studies, such as Fjeld, et.al. (2020) that suggest that we have reached a consensus on ethics and analytics. I would argue that this is far from the case. The appearance of ‘consensus’ is misleading. For example, in the Fjeld, et.al., survey, though 97% of the studies cite ‘privacy’ as a principle, consensus is much smaller if we look at it in detail (Ibid:21). The same if we look at the others, eg. Accountability (Ibid:28).

And these are just studies strictly within the domain of artificial intelligence. When we look outside the field (and outside the background assumptions of the technology industry) much wider conceptions of ethics appear.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Ethics, Analytics and the Duty of Care Copyright © by National Research Council Canada is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.