Outsmarting AI. Brennan Pursell
Чтение книги онлайн.
Читать онлайн книгу Outsmarting AI - Brennan Pursell страница 10
The constant threat of fraud in millions of electronic payment transactions demands the best tools for cost-effective, automated oversight. AI applications are cutting-edge, built on old math. The square-root rule was discovered by Abraham de Moivre in Switzerland in 1718.
See how there’s little that’s new in the math that underlies AI?! This chapter does not get into calculus, derivatives, and linear algebra, but they are even older, stemming from the 1600s. Without them, the algorithms you will meet next would be unthinkable.
The Software of AI
Now for the software—the coded algorithms that perform the math.
You have probably heard the terms machine learning, predictive analytics, deep learning, and neural networks, which refer to groups of algorithms in code. We’re going to go out on a limb here: In artificial intelligence, all four are pretty much the same thing.[10] Okay, “deep learning” algorithms are usually associated with image classification and voice transcription, but they use “neural networks” just like the others can. All four use the prediction rules or models discussed above. All four involve mathematical, statistical algorithms working on data. There’s no need to parse technical jargon flaunted by marketers.
And, just to reiterate, machines cannot “learn” the way a human does; hardware and software are nothing like neurons; and “deep” can mean anything. Computers do not have self-awareness, independent consciousness, feelings, or even thoughts. AI software is a set of data-analysis tools. All code is prone to bugs, and all computer systems crash from time to time. AI is down-to-earth.
One final observation before we get to term definitions: Software is a lot like the law. They both find commonality in reason. Our point is that business software and the law are not natural enemies. Software and math classify and perform procedures—the former, with numbers, the latter, with words.
Indulge a shallow dive into distant history: The same person, Gottfried Wilhelm Leibniz (1646–1716), developed calculus at about the same time Isaac Newton devised mechanical calculators and created the binary number system that computers use today, whereby all numbers are expressed in 1s and 0s. Leibniz also developed a rational, legal “machine,” a code, for classifying disputes (input data) and generating rulings (classification outputs). For him, math and the law complemented each other.
Okay. We’re ready for the software.
Data captured in software must be accurate, reliable, and correctly classified for any of the above procedures to produce useful results. Computer data can come from many sources: keyboard entries, audio recordings, visual images, sonar readings, GPS, document files, spreadsheets, etc. The devices gathering sensory inputs must themselves be of high quality for the sake of accuracy. All the data is reduced to series of numbers and are normally stored in tables, with many rows and columns. If the data is flawed in any way, the procedures will be risky at best. If the procedures are inaccurate, then your business may make poor decisions and follow wrong actions.
As we said before, “Garbage in, garbage out” is as true today as it ever was. An AI firm proclaiming that its products or services can take any kind and quality of data and turn it into perfect predictions and decisions is just alchemy from the Middle Ages. Many people tried in vain for many centuries to turn common metals such as lead into gold. Even a genius like Isaac Newton poured many hours into that total waste of time and energy.
“Garbage in, garbage out” is true of the law as well. If the evidence is faulty or fake, the results of the court proceedings are going to be skewed, distorted, or just dead wrong. The goal is to calculate right determinations and benefit stakeholders. Data science and the law are compatible, indispensable tools in the struggle to make the right move and do the right thing.
Structured data refers to data stored in tables of rows and columns and formatted in a database for queries and analysis. Queries retrieve data, update it, insert more, delete some, and so on. SQL (structured query language), a computer programming language from the 1980s, is still widely used, especially by Microsoft and Amazon Web Services (AWS), and there are dozens of others. Structured data has to be clean and complete. It has to be accurate.
Let’s use the 80/20 rule here: 80 percent of the time spent on your AI projects will be spent in data preparation. (More on that in chapter 5.)
AI guru Andrew Ng says that structured data “is driving massive value today and will continue as companies across all industries transform themselves with AI.”[11]
Unstructured data gets more of the media attention, because people are more impressed when a machine can identify objects in pictures and respond to written and spoken speech in a human-like way. Unstructured data includes digitized photographs and video, audio recordings, and many kinds of documents that people can readily understand much faster and more thoroughly than a computer. It took years and many millions of images and dollars to get AI algorithms to identify cats in photos with a high degree of reliability. Your average three-year-old child would get it right for the rest of her life after one or two encounters.
Back to data in general. It is the source of your organization’s knowledge, not the AI. It is a key part of your institution’s historical record, actually. Your data is a cross between a gold mine and a swamp. Working through it can really pay off, but you have to be very, very careful. Always maintain a healthy, skeptical attitude toward your data. How accurate is it, really? How was it compiled? Are there obvious or hidden biases that might skew your analysis and its conclusions?
Historical data presents a big problem for AI applications in the US criminal justice system. More often than not, inmates with lighter skin in the past have received parole at a much higher rate than those whose skin is darker, and it is no surprise that AI predictive models recently put these results into use in similar decisions.[12] To what extent does data about past court decisions reflect poor legal representation in court or base prejudice instead of actual guilt or innocence? There is no easy answer to this loaded question. Drug enforcement efforts quite frequently concentrate on neighborhoods with darker-skinned, poorer inhabitants, although drug trade and use are relatively color-blind and prevalent among all socioeconomic classes. Data collected from these efforts used to predict the best time and place for the next drug bust are virtually guaranteed to continue the trend. The computer certainly hasn’t the foggiest idea that anything could be wrong in the data’s origin or derivation.
The problem is nearly universal. In the United Kingdom, a model was used in Durham, in northern England, to predict whether a person released from prison would commit another crime—until people noticed that to a high degree, it correlated repeat offenders with those who live in poorer neighborhoods. Authorities then removed residential address data from the system, and the resulting predictions became more accurate.[13]
People are just people. We are sometimes rational and sometimes not, depending on the circumstances. If only our legal systems were as rational and reliable as mathematical analysis. The two meet in data and the analysis of it. Our great, shared challenge in this age of AI is that the machines serve the people, and not the reverse.
Let’s return to our key term definitions for AI software.
Rules in software say what your organization will do with the outputs. You set the operational rules.