Enterprise AI For Dummies. Zachary Jarvinen
Чтение книги онлайн.
Читать онлайн книгу Enterprise AI For Dummies - Zachary Jarvinen страница 12
Variety
Many more types of data are available than ever before. Traditionally, companies focused their attention on the data created in their corporate systems. This was mainly structured data — data that follows the same structure for each record and fits neatly into relational databases or spreadsheets.
Today, valuable information is locked up in a broad array of external sources, such as social media, mobile devices, and, increasingly, Internet of Things (IoT) devices and sensors. This data is largely unstructured: It does not conform to set formats in the way that structured data does. This includes blog posts, images, videos, and podcasts. Unstructured data is inherently richer, more ambiguous, and fluid with a broad range of meanings and uses, so it is much more difficult to capture and analyze.
A big-data analytics tool works with structured and unstructured data to reveal patterns and trends that would be impossible to do using the previous generation of data tools. Of the three Vs of big data, variety is increasingly costly to manage, especially for unstructured data sources.
Velocity
Data is coming at us faster than ever. Texts, social media status updates, news feeds, podcasts, and videos are all being posted by the always-on, always-connected culture. Even cars and refrigerators and doorbells are data generators. The new Ford GT not only tops out at 216 miles per hour, it also has 50 IoT sensors and 28 microprocessors that can generate up to 100GB of data per hour.
And because it’s coming at us faster, it must be processed faster. A decade ago, it wasn’t uncommon to talk about batch processing data overnight. For a self-driving car, even a half-second delay is too slow.
When AI was just starting out, data was scarce. Consequently, the quality of information generated was of limited value. With the advent of big data, the quality of the information to be harvested is unprecedented, as is the value to the enterprise of modern AI initiatives.
Storage
AI requires massive amounts of data, so massive that it uses a repository technology known as a data lake. A data lake can be used to store all the data for an enterprise, including raw copies of source system data and transformed data.
In the decade from 2010-2020, data storage changed more in terms of price and availability than during the previous quarter century, and due to Moore’s Law, that trend will continue. Laptop-peripheral, solid-state drives priced at hundreds of dollars today have the same capacity as million-dollar hard-drive storage arrays from 20 years ago. Large-scale storage capacity now ranges up to hundreds of petabytes (a hundred million gigabytes) and runs on low-cost commodity servers.
Combined with the advent of more powerful processors, smarter algorithms and readily available data, the arrival of large-scale, low-cost storage set the stage for the AI explosion.
Discovering How It Works
Artificial intelligence is a field of study in computer science. Much like the field of medicine, it encompasses many sub-disciplines, specializations, and techniques.
Semantic networks and symbolic reasoning
Also known as good old-fashioned AI (GOFAI), semantic networks and symbolic reasoning dominated solutions during the first three decades of AI development in the form of rules engines and expert systems.
Semantic networks are a way to organize relationships between words, or more precisely, relationships between concepts as expressed with words, which are gathered to form a specification of the known entities and relationships in the system, also called an ontology.
The is a relationship takes the form “X is a Y” and establishes the basis of a taxonomic hierarchy. For example: A monkey is a primate. A primate is a mammal. A mammal is a vertebrate. A human is a primate. With this information, the system can not only link human with primate, but also with mammal and vertebrate, as it inherits the properties of higher nodes.
However, the meaning of monkey as a verb, as in “don’t monkey with that,” has no relationship to primates, and neither does monkey as an adjective, as in monkey bread, monkey wrench, or monkey tree, which aren’t related to each other either. Now you start to get an inkling of the challenge facing data scientists.
Another relationship, the case relationship, maps out the elements of a sentence based on the verb and the associated subject, object, and recipient, as applicable. Table 1-1 shows a case relationship for the sentence “The boy threw a bone to the dog.”
TABLE 1-1 Case Relationship for a Sentence
Case | Threw |
Agent | Boy |
Object | Bone |
Recipient | Dog |
The case relationship for other uses of “threw” won’t necessarily follow the same structure.
The pitcher threw the game.
The car threw a rod.
The toddler threw a tantrum.
Early iterations of rules engines and expert systems were code-driven, meaning much of the system was built on manually coded algorithms. Consequently, they were cumbersome to maintain and modify and thus lacked scalability. The availability of big data set the stage for the development of data-driven models. Symbolic AI evolved using the combination of machine-learning ontologies and statistical text mining to get the extra oomph that powers the current AI renaissance.
Text and data mining
The information age has produced a super-abundance of data, a kind of potential digital energy that AI scientists mine and refine to power modern commerce, research, government, and other endeavors.
Data mining
Data mining processes structured data such as is found in corporate enterprise resource planning (ERP) systems or customer databases, and it applies modelling functions to produce actionable information. Analytics and business intelligence (BI) platforms can quickly identify and retrieve information from large datasets of structured data and apply the data mining functions described here to create models that enable descriptive, predictive, and prescriptive analytics: