Just like the recent Presidential election showed us that just because the polls say so doesn’t make it true, the widespread belief that if it’s a math equation it must be objective isn’t holding up. Algorithms built on Big Data are written and revised by humans, and machine learning algorithms revise themselves based on people’s behavior. Researchers in computer science, ethics and law say algorithms built on Big Data can and do reflect human prejudices and are discriminatory.
“Even if they are not designed with the intent of discriminating against those groups, if they reproduce social preferences even in a completely rational way, they also reproduce those forms of discrimination,” said David Oppenheimer, who teaches discrimination law at the University of California, Berkeley. Where is this happening? Here are just two examples…
Criminal Justice System
ProPublica, a non-profit investigative publisher, did a study on the Big Data algoritms that are used by police departments and district attorney’s offices to predict how likely a criminal is to reoffend. The model’s predicted that black Americans have a higher probability of reoffending. ProPublica found that these models were assigning a false positive to blacks (a higher probability they will reoffend then what actually happens) and a false negative to whites. The model’s forecasts are less accurate than a coin flip in accurately predicting whether someone will reoffend. The models are biased, and according to ProPublica “racist.” These discriminatory Big Data algorithms have serious consequences. Judges are using this information to determine everything from bond amounts to sentencing.
Auto Insurance Pricing
Cathy O’Neil’s new book, Weapons of Math Destruction, describes a Consumer Reports analysis of more than 2 billion car insurance price quotes over two years. Their analysis showed that someone’s driving ability had little to do with how much they paid for insurance. Insurers scored drivers through their credit reports. They wanted to know how likely they were to file a claim or shop around for cheaper insurance. Spending characteristics carried more weight than driving records. She gives an example of this analysis from Florida: “Drivers with a clean driving record but poor credit paid $1,552 on average more than drivers with excellent credit and a drunken driving conviction.”
What is being done to address this problem?
1. Mathematicians are recognizing this problem and trying to design their Big Data algorithms from the start in a more discrimination-conscious way.
2. Academicians are coming up with tools to test algorithms for veiled discriminatory results and how to fix them when necessary.
3. There is an argument that consumers should receive an alert when their credit records are used to create mathematical profiles. This gives them the opportunity to challenge and correct it when the information used to create the scores is wrong or incomplete.
4. Assessing and addressing discrimination through the legal “disparate impact” theory. Attorneys are using this theory to successfully challenge policies that are discriminatory, whether or not the policy or procedure was motivated by an intent to discriminate.
5. Creation of new regulations or a regulatory body to govern Big Data algorithms.
While it often seems like the online world is out of our control, determining our news feed from Facebook, our dates from Match.com and our Google search results, Big Data algorithms that determine what we are approved for in hiring, housing or health care or what we are denied need to rise to a higher, transparent standard.
Executive Recruiters, Smith Hanley Associates, recruit and place data science candidates who work with these algorithms every day. Contact Nancy Darian, Nihar Parikh or Paul Chatlos to talk about hiring or being hired as a specialist who insures Big Data algorithms are the best they can be.