Predictive analysis – an introduction and considerations

This article sets out an introduction to Predictive Analysis, Big Data, Artificial Intelligence, Machine Learning and its potential use by local authorities. It provides working definitions and identifies some of the current challenging issues such as human rights, treatment of minorities and potential strategies to avoid legal intervention.

This article is emphatically not about the potential dystopian world that the fear of science may foster to those of a romantic imagination [1] [2]. Systematic consideration of data aided by Artificial Intelligence could turn out to be a very useful tool. The appeal of prevention of bad things happening before they occur is part of basic risk management that is to say - identify the risk, minimise the prospect of occurrence and take measures to reduce its impact if should happen.

So, can we tell what will happen from what we know already? Yes of course the science of metrology does it every day with some of the most powerful computers in existence [3]. Data can be used to help the picture be drawn. They use algorithms for a form of statistical modelling. One example is the use of algorithms in the justice system and the Law Society is currently examining the issue and has been collecting evidence [4].

This process of predicting the future is now being developed using a particular form of algorithm called Predictive Analytics. This technique draws upon big data, applies processing power to it including Artificial Intelligence (AI) and thus predictions about the data subjects can be made. Utilising existing data and extracting further information is called data mining.

Such techniques can be used for marketing and inevitably without regulation the processing of data outcomes where they relate to human beings can be intrusive of a person’s privacy [5].

Predictive Analysis and welfare

It stands to reason that local authorities in their care roles will have access to data which can inform them as to the likelihood as to a future event will occur. The authorities are often in possession of or able to access a great deal of data including social care, housing, health education of individuals and families. Further on the mission to tackle safeguarding issues and tackle ‘troubled families’ the use of MASH (Multi-Agency Safeguarding Hubs) is becoming for local authorities a structural organisational orthodoxy.

The MASH is set the task to identify the problems which may be many faceted. The biggest challenge the local authority faces are the establishment of the capability to carry out assessments. At the same time the volume and complexity of data regarding the clients of the MASH is challenging. It fits the definition of ‘Big Data’ that is “…high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.". [6]

Such professional practitioners capable of carrying out such work can present a challenge to recruit. So, options to supplement their practice will obviously be examined including the use of data analytics coupled with artificial intelligence to process the Big Data.

The local authority understandably wants to tackle the troubled family in its midst and is uniquely placed to gather the data which in turn is mined and with predictive analytics powered with AI identifies people for intervention. Without effective mastery there may be a temptation to let the AI do the work rather than expensive housing or social work or youth worker professionals.

The most common approach is for the algorithm design to apply what is termed ‘regression analysis’ to a set of variables such as for example medical history, school attendance, offending, tenancy behaviour, work and benefits status. Because the purpose is to build a tool for identifying signifiers the personal identifying data will be stripped away. Ideally from historic collection of cases a picture emerges, which will identify how many variables are correlated and by using these coefficients a score can be created predicting the likelihood of an outcome. This tool is then applied to clients who either present themselves or become known to the local authority and other safeguarding partners. The application of predictive analytics thus could lead to the identification of a potential troubled person or family.

However, it is not quite that simple because predictive analysis will have a component of assumptions including that the historic behaviours of people are likely to repeat themselves given similar circumstances. It is therefore important that the data is as robust and its collection as rigorous as possible to ensure the assumptions are sound. Furthermore, unless measures are taken then feedback loops can occur, and by that, I mean the looking for say offending and finding it may be seen as a confirmation of the analysis whereas the true level of offending was not properly measured to start with.

As an example let’s say the Predictive Analysis identifies a location in a community with a high likelihood of criminal activities. Officers are dispatched and in due course they may find some youths carrying weapons such as e.g. knives. So, they are stopped and arrested. This event is added to data to make future assumptions. As knives are found further officers are sent and so on. The assumption becomes re-enforced. The consequence is that various people or groups could be stereotyped, and that picture becomes archetypical.

A further concern is that Predictive Analysis generates a new form of sensitive personal information, that is to say that the data subject may not even be aware the new information has been created. The implication is that they can’t give their consent to its handling as they are not aware of its existence.

Coupled with the exponential growth of the Big Data, is the application of Artificial Intelligence (AI). So, what is AI? It is a form of machine learning for itself. Stanford University (2018) defined it as:

…a science and a set of computational technologies that are inspired by—but typically operate quite differently from—the ways people use their nervous systems and bodies to sense, learn, reason, and take action.

The Information Commissioner (IC) adopts the following AI definition [7]:

…the analysis of data to model some aspect of the world. Inferences from these models are then used to predict and anticipate possible future events.

The IC makes the point that AI demonstrates a further aspect being “Machine Learning” that means that the AI learns from its activities and creates new algorithms.

As some readers may recall, Professor Stephen Hawking was very concerned about AI. He observed:

The development of full artificial intelligence could spell the end of the human race…It would take off on its own, and re-design itself at an ever-increasing rate. Humans, who are limited by slow biological evolution, couldn’t compete and would be superseded.

interview with the BBC, December 2014

There is a further consideration. Whose property is the algorithms drawn by ML and AI from Big Data anyway? Data is not property per se. It can be possible via the General Data Protection Regulations to locate some principles. The ICO has produced Guidance (see https://ico.org.uk/media/for-organisations/documents/2013559/big-data-ai-ml-and-data-protection.pdf) A key point about Predicative Analytics is that the Big Data can be used to create valuable new data, that is to say predictive programmes modelled on data stripped of personal identifying features. Local authorities should be very clear about asserting their right to a share of the value created and ensure that commissioning of data analytics products contracts make it clear.

Predicative Analysis and Human Rights

Predictive Analytics activities are potentially intrusive to human rights and would appear to raise implications regarding the European Convention on Human Rights. More particularly rights under Article 8, being:

Article 8 - right to respect for private and family life

Everyone has the right to respect for his private and family life, his home and his correspondence.
There shall be no interference by a public authority with the exercise of this right except such as is in accordance with the law and is necessary in a democratic society in the interests of national security, public safety or the economic well-being of the country, for the prevention of disorder or crime, for the protection of health or morals, or for the protection of the rights and freedoms of others

Article 8.1 is a qualified right [8] and 8.2 qualifies it to potentially give some latitude to the prevention of crime and protection of health. But when this right was drafted computers were in their infancy and AI was science fiction. Furthermore, what is the position with as it were ‘pre-crime’ or potential ‘troubled families-to-be’ and what of the movement to classify violent youth and knife crime as a public health matter - is that within the qualified exceptions, maybe?

I have noted of late several authorities have procured such technology. It makes a great deal of sense to supplement practitioners with a tool which appears to predict what someone of interest to them is likely to do. But here’s the thing, the more local authorities buy into the Predicative Analytic product the stronger it and a more powerful paradigm it becomes.

From above a picture is emerging of a whole [9] new world of Big Data, Predictive Analytics and MI which offers huge potential but also very serious risks of the local authority being compromised.

Now let’s look at some of the risks and concerns. The first thing to observe is that not using Big Data has ceased to be an option. Alexander Babuta, Research Fellow, RUSI [10] giving evidence to the Law Society [11] observed:

“In summary given the sheer volume and complexity of data that large organisations collect on a daily basis the use of algorithms is now essential to draw any meaningful insights from that data…”

This is a statement of the reality; the Big Data genie won’t go back in the bottle. So, we need to take command and that means those that handle the outputs must be able to account for the process.

Professor Lilian Edwards [12] recently addressed the Law Society on the subject of a ‘right to an explanation’. Now Article 21 of the EU General Data Protection Regulation [13] allows an individual to object to being profiled and Article 22 says a data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her but this is subject to there being no human involvement. But it does not say that data subject has a right to an explanation of how the algorithm came to the output it did. S.98 DPA 2018 gives a right to the knowledge of the reasoning underlying the process when it involves the intelligence services. So, there is not a general right in UK statute law [14]. Again, a strong reason for practitioners involved with the Predictive Analytics having complete mastery and being able to give an account as to what exactly is going on as it were.

Nevertheless, while the statute is silent, that does not mean the common law is not able to assist. From a legal perspective a decision maker needs to be clear that the data generated has the proper weight due in its consideration and it was not relied on exclusively. If a challenge were launched a potential focus could be to ask the local authority how its algorithm(s) came to the conclusion it did regarding the outcome. If that can’t be explained, then Wednesbury unreasonableness becomes a strong possibility of challenge i.e. if you can’t explain the process it is arguable that irrational decisions were made and as such can be struck down.

Built-In bias?

This month (February 2019) Liberty published its paper Policing by Machine [15]. It was particularly critical of the Durham Police’s algorithm the ‘Harm Assessment Risk Tool’ (HART). This programme considers 34 pieces of data 29 of which relate to a subject’s past criminal history [16] and is according to Liberty supplemented by a product called Mosaic [17] [18]. Inevitable one factor is where the person lives. It is argued that by taking into account a person’s postcode it is effectively a ‘proxy for race’. The ICO (2017, p44) makes comments too about how the algorithm may have bias:

Machine learning itself may contain hidden bias. We saw earlier the issue of feedback bias. A common phrase used in the discussion of machine learning is “garbage in garbage out" [19]. Essentially, if the input data contains errors and inaccuracies, so will the output data. While supervised machine learning often involves a pre-processing stage to improve the quality of the input data, the human-labelling of a training dataset can create a further opportunity for inaccuracies or bias to creep in. Hypothetically, a predictive model used in recruitment may achieve an overall accuracy rate of 90%, but this may be because it is 100% accurate for a majority population who make up 90% of applicants but wholly inaccurate for minority groups who make up the other 10%. It would be necessary to test for this and build in corrective measures.

Again, we revisit the theme that the practitioners who use such tools need to be able to explain how they have taken account of bias because they will clearly be bound by the Equality Act 2010 public sector duty [20].

Conclusion

This is a simple introduction for local authority lawyers to the world of Big Data Predictive Analysis coupled with AI and Machine Learning. People and their families may become earmarked for statutory powers intervention not just because of what they have done but because a computer says there is a likelihood of something that may happen. Further, while there will need to still be evidence-based decision making for the courts, inevitably the validity of the output of Predictive Analytics will potentially be an influencer, but it should never usurp the role of the practitioner to ensure the paramountcy of the welfare of those who we have a duty to protect.

So, while there are clear human rights and equality issues, there is evidence that Predictive Analytics can pick up risks that could otherwise not be picked up [21].

Finally, as Babuta points out, the big data does need to be analysed but the professionals that use it need to be clear what its limitations are, that bias and discriminations are accounted for and that they take ultimate responsibility for the decision making. New ethics protocols on handling predictive analytics are required for the professions involved. If this does not happen then we can surely predict the courts will intervene.

Dr. Paul Feild is a Senior Solicitor working in the Barking & Dagenham Legal Services Governance Team. He researches and writes on governance issues and can be contacted This email address is being protected from spambots. You need JavaScript enabled to view it..

References

Anderson, L. Human Rights in The Age of Artificial Intelligence, 2018 Accessnow.org

Crawford, K. & Schultz,J. Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms, 55 B.C.L. Rev. 93 (2014), http://lawdigitalcommons.bc.edu/bclr/vol55/iss1/4

Ensign, D., Friedler, S.A,. Neville, S., Scheidegger, C. & Venkatasubramanian, S. Runaway Feedback Loops in Predictive Policing, Proceedings of Machine Learning Research 81:1 12, 2018

Humeny Roberts,Y., O’Brien,K., & Pecora,PJ. Considerations for Implementing Predicative Analytics in Child welfare Casey Family Programs April 2018

Information Commissioner Office, Big Data, Artificial Intelligence, Machine Learning and Data Protection, 2017

Hornby Zeller Associates, Inc, Predictive Risk Modelling Tool Implementation: Process Evaluation, January 2018

Law Society, Summaries from Second Evidence Session - Algorithms in the Justice System Sessions 1,2 and Witness Submissions 2018

Liberty, Policing By Machine, February 2019 Liberty

Matt Reynolds, Biased policing is made worse by errors in pre-crime algorithms New Scientist 4 October 2017, updated 27 April 2018

[1] Mary Shelley’s Frankenstein (1818) for example.

[2] So (let’s get the reference over), in the Steven Spielberg Film Minority Report, an alt-future is visioned with a department of pre-crime. Here Tom Cruise works to prevent crime by anticipating the offence before it is committed. The would-be offender is restrained before they commit the crime – though

[3] The Met Office’s Cray XC40 ranks in the top fifty super computers in the world.

[4] Law Society Technology and the Law Policy Commission.

[5] An example in the United States occurred where a data mining application was run on a retailer’s female consumers, which indicated by their buying behaviours of supplements there was a likelihood of them being or seeking to be pregnant. Such data is valuable in the business of baby care retail and the details were then used for direct marketing.

[6] OIC definition from Gartner IT Glossary 2016

[7] Government Office for Science. Artificial intelligence: opportunities and implications for the future of decision making. 9 November 2016.

[8] That means it is subject to a balancing exercise of weighing individual personal freedom against the public interest (see 8.2).

[9] Avoid the use of ‘Brave’!

[10] Royal United Services Institute

[11] 25 July 2018

[12] Professor of E-Governance University of Strathclyde Law Society 25 July 2018

[13] See https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/individual-rights/rights-related-to-automated-decision-making-including-profiling/

[14] But section 50(4) of the Data Protection Act 2018 allows regulations to be made.

[15] Author Hannah Couchman – Contributing Researcher Alessandra Prezepoiski Lemos

[16] This would be known criminal history of course.

[17] See https://www.experianintact.com/content/uk/documents/productSheets/MosaicConsumerUK.pdf

[18] It classifies families with needs as “Stacey” see also group K – Why not use the link above look up your name and see which stereo-type you fit in!

[19] Also known as Roin-Rout

[20] It means that public bodies must consider all individuals when carrying out their day-to-day work – in shaping policy, in delivering services and in relation to their own employees.

It also requires that public bodies have due regard to the need to: eliminate discrimination; advance equality of opportunity; foster good relations between different people when carrying out their activities

[21] See work in US Allegheny County Pennslvania Predictive Analytics used for determining visits.