For thousands of inmates at local, state, and now federal prisons, where they live, how much visitation time they get, and even the length of their sentence might be determined by a computer algorithm. Software developed to help personalize and expedite the rehabilitation and release of prisoners is being turned toward sentencing and even pre-trial assessment across the country. These algorithms can carry a great deal of weight in sentencing thanks to their appearance of objectivity and fairness, but they might be letting some troubling prejudices in the criminal justice system persist under a shiny veneer of data analytics.

Does your state use risk assessments in sentencing, corrections, or parole?

U.S. criminal rehabilitation has used risk assessment for decades. The first references to actuarial—in this context, statistical analysis performed by a human—post-sentencing risk assessments appeared nearly a century ago. A new permissiveness in where and how risk scores are calculated, however, makes the modern application different and potentially troubling.

On July 13th, 2016, the Wisconsin Supreme Court ruled in favor of the state on the case Eric Loomis, which allows judges within the state to use risk assessment scores as part of a sentencing decision. However, the company that developed the software even suggests against using their product for sentencing. Risk assessment algorithms can provide an avenue for prejudice to slip through in the guise of an innocuous, agendaless number.

As the president launches a data-driven criminal justice initiative and an impressive—and bipartisan—coalition renews their support for a sentencing reform bill that makes use of risk assessment, it’s worthwhile to ask whether or not these algorithms are fair or equal before they become federal law.

Artificial Unintelligence

There are a myriad of tests in use, with the number of questions ranging from fewer than ten to over 100. The tests ask defendants, prisoners, or parolees—depending on when in the correctional process the test is administered—questions about static (demographic or biographical) and dynamic (behavioral, changeable) predictors of how likely the test taker is to commit a crime again within a certain period following release, otherwise known as “recidivism.” That prediction comes out at as a numerical score, typically between one and ten, that betrays nothing of what went into calculating it. Even though risk assessment tests don’t ask about race or income, they can determine that demographic information post facto using strong correlations.

Table 1. Risk Assessment Instruments Validated and Implemented in Correctional Settings in the United States
Question types Question Type Frequency on Tests Studied (N=19)
Attitudes 63%
Associates/Peers 68%
History of Antisocial Behavior 100%
Personality Problems 42%
Relationships 63%
Education & Employment Status 84%
Recreation/Leisure Activities 26%
Substance Use Problems 100%
Mental Health 53%
Housing Status 42%
Note: Risk assessment tests don’t ask about race or income, but it’s easy enough to figure that information out. Try the Marshall Project’s data tool on the “New Science of Sentencing” to see how demographic information can be deduced.

Source: “Risk Assessment Instruments Validated and Implemented in Correctional Settings in the United States” by Sarah L. Desmarais and Jay P. Singh.

What’s even more troubling, risk assessments can provide a false sense of mathematical legitimacy to a prejudiced status quo. ProPublica studied at least one widely-used risk assessment algorithm that demonstrated a bias against African-Americans. The algorithm over-scored African-Americans, giving them higher scores that didn’t reflect actual recidivism, and underscored whites.

Table 2. Prediction Fails Differently for Black Defendants
White African American
Labeled higher risk, but didn’t re-offend  23.5% 44.9%
Labeled lower risk, but did re-offend 47.7% 28.0%
Note: Overall, Northpointe’s assessment tool correctly predicts recidivism 61 percent of the time. But blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend. It makes the opposite mistake among whites: They are much more likely than blacks to be labeled lower risk but go on to commit other crimes.

Source: Propublica analysis of data from Broward County, Fla.

Judges and experts doubt that risk assessment instruments as they exist now—and using the generalized data that they do—produce reliable or useful scores. Furthermore, risk assessments are not consistent between raters using the same test.

The data-driven nature of algorithms gives the false impression that they are incorruptible in a way human analysts are not. The ostensible “mathematical truth” projected by the use of algorithms disguises these dangerous potentials for large-scale penal and judicial inequalities, which may well be unintentional, by imbuing them with the authority of the impartiality of machines.

If the goal of risk assessment algorithms is to predict recidivism, then the ones we’ve got are squeaking by. If they’re meant to prevent biased sentencing, there’s a pretty big glitch.

How Could Risk Assessment Algorithms be Used More Effectively?

Law enforcement at all levels of government use algorithm-driven strategies, which could be a very beneficial in that data-driven reform is changing the justice system by saving states millions of dollars. However, it continues to fail defendants, prisoners, and parolees in so many of the ways that it always has.

Attorney General Eric Holder spoke to the issue of picking the right data set back in 2014 at the National Association of Criminal Defense Lawyers Annual Meeting and State Criminal Justice Network Conference:

“In the context of directing law enforcement resources and improving reentry programs, intensive analysis and data-driven solutions can help us achieve significant successes while reducing costs. But…by basing sentencing decisions on static factors and immutable characteristics—like the defendant’s education level, socioeconomic background, or neighborhood—they may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.”

Holder’s statement draws an important distinction between risk assessments done at sentencing, which are based on static factors, and dynamic assessments that occur during the sentence, based on prisoner behavior. Risk assessments performed at sentencing cloak classist and racist assumptions about what sorts of people are “risky” and delivers them as an innocent looking number.

Turning algorithmic criminal justice into a workable system will require more transparency from the corrections facilities about what risk entails. In 2015, the Congressional Research Service released a report on “risk and needs assessment” in corrections and criminal justice. Other than that, outsiders have had little insight into the technical aspects—that is, the algorithms used—of risk assessment, and how assessment scores are applied.

One journalist filed Freedom of Information-equivalent requests in all fifty states asking for information on algorithms used in sentencing, and not one returned information on how “criminal justice risk assessments forms were developed or evaluated.” Unfortunately, because the risk assessment algorithms are software, it’s unclear if they’re covered by the Freedom of Information Act (FOIA). Even if they are, private companies design and run the vast majority of these programs, placing risk assessment algorithms largely outside the reach of FOIA.

Understanding the mechanics of the risk assessment algorithms used is a palliative cure to a long-term problem—the same problem seen in mandatory sentencing, where we see all the risks of algorithmic bias. The application of uniform, ostensibly objective rules to all cases serves to legitimize practices rooted in prejudice, even if they are not explicitly motivated by it.

Such is the issue in the case of Duane Buck, which the Supreme Court will hear next term. The case centers around the legitimacy of an expert witness’ use of Buck’s race—Buck is African-American—as a “statistical factor” that could be used to “predict future dangerousness.” In a dissent authored in 2013, Justice Sonia Sotomayor wrote that the prosecution had styled their argument “to persuade the jury that Buck’s race made him more dangerous and that, in part on this basis, he should be sentenced to death.”

Lest we forget, humans write software, so software can only be as unbiased as the people who wrote it.

A technical fix will only last as long as risk assessment does; a proactive legal approach must be tied to the underlying issue of masking prejudice under a legitimate-seeming haze of data that is weighted and analyzed out of view of the public, not to whatever mode is used to execute it. The solution to algorithmic bias is likely not a technical one, or at least not entirely. Lest we forget, humans write software, so software can only be as unbiased as the people who wrote it. Better algorithms together with better laws could help improve the correctional system and public safety. Although humans have managed to build machines that are smarter than we are, we’ve not yet been able to make them any wiser.