Enterprise AI For Dummies. Zachary Jarvinen

Чтение книги онлайн.

Читать онлайн книгу Enterprise AI For Dummies - Zachary Jarvinen страница 13

Enterprise AI For Dummies - Zachary Jarvinen

Скачать книгу

Association: This determines the probability that two contemporaneous events are related. For example, in sales transactions, the association function can uncover purchase patterns, such as when a customer who buys milk also buys cereal.

       Classification: This reveals patterns that can be used to categorize an item. For example, weather prediction depends on identifying patterns in weather conditions (such as rising or dropping air pressure) to predict whether it will be sunny or cloudy.

       Clustering: This organizes data by identifying similarities and grouping elements into clusters to reveal new information. One example is segmenting customers by gender, marital status, or neighborhood.

       Regression: This predicts a numeric value depending on the variables in a given dataset. For example, the price of a used car can be determined by analyzing its age, mileage, condition, option packages, and other variables.

Because data mining works on the structured data within the organization, it is particularly suited to deliver a wide range of operational and business benefits. For example, data mining can crunch data from IoT systems to enable the predictive maintenance of factory equipment or combine historical sales data with customer behaviors to predict future sales and patterns of demand.

      Text mining

      Text mining deals with unstructured data, which must be organized and structured before applying data modeling and analytics. Using natural-language processing (NLP), text-mining software can extract data elements to populate the structured metadata fields such as author, date, and content summary that enable analysis.

      Text mining can go beyond data mining to synthesize vast amounts of content to identify people, places, things, events, and time frames mentioned in written text, assign emotional tone to each mention of them (negative, positive, or neutral), and even understand whether the document is factual or opinion.

      

Text mining is important for its ability to digest unstructured textual data, which contains more context and valuable insights than structured, transactional data, because it reflects the author’s opinion, intention, emotion, and conclusions.

      In 2018, Google introduced a technique for NLP pre-training called Bidirectional Encoder Representations from Transformers (BERT). This technique replaces ontologies with statistical-based mining to ratchet up the relevance of search results.

      With AI and machine learning comes an assumption that the more clean data you have, the more accurate your predictions become. But this also assumes you have the horsepower to process and analyze that data quickly, at scale, without dimming the city’s lights. To be effective at customer analysis, AI solutions must process immense amounts of data efficiently and scale to meet increasing volumes of data over time as it is collected and persisted.

Data Mining Text Mining
Overview Data mining searches for patterns and relationships in structured data. Text mining transforms unstructured textual data into structured information to enable data analysis.
Data Type Structured data from large datasets is found in systems such as databases, spreadsheets, ERP, and accounting applications. Unstructured textual data is found in emails, documents, presentations, videos, file shares, social media, and the Internet.
Data Retrieval Structured data is homogenous and organized, making it easy to retrieve. Unstructured textual data comes in many different formats and content types located in a more diverse range of applications and systems.
Data Preparation Structured data is formal and formatted, facilitating the process of ingesting data into analytical models. Linguistic and statistical techniques — including NLP keywording and meta-tagging — must be applied to turn unstructured into usable structured data.
Taxonomy There is no need to create an overriding taxonomy.

      Machine learning

      Machine learning (ML), a subset of artificial intelligence, enables users to learn from historical data to achieve a desired outcome. It powers targeted ads, personalized content, song recommendations, predictive maintenance activities, and virtual assistants.

      ML mimics human learning by absorbing information. Humans learn by reading, watching, listening, and doing. ML learns by processing historical data. For example, a human’s knowledge of elephants is based on historical experience, such as going to the zoo, riding an elephant, watching a documentary, and reading a book. ML gains knowledge of elephants by processing text and images.

      

The learning phase consists of these steps:

      1 Sample historical data (machine activity, customer attributes, and transactions).

      2 Apply algorithm to historical data to learn key patterns and trends.

      3 Generate a model or set of rules or instructions.

      1 Load the existing model.

      2 Apply the model to new data.

      3 Predict the likelihood of an outcome (in other words, customer churn).

      The output of the prediction phase feeds back into the input of the learning phase to refine the model.

      Learning

      For the purposes of ML, historical data is called training data. In the case of text mining, the system uses OCR and NLP to process text. For images, the system uses computer vision techniques for detection, recognition, and identification to process the image.

      The algorithm processes the data to detect key patterns and trends and correlate them to labels. For example, if you’re doing text mining, the algorithm might notice certain words being associated with elephants, such as large, gray, tusk, and trunk, and associate those with the label “elephant.” Later, in the prediction phase, when the algorithm sees a significant number of these terms, it calculates the probability that the passage is talking about an elephant.

      In the learning phase, the system applies statistical

Скачать книгу