Data Science For Dummies. Lillian Pierson
Чтение книги онлайн.
Читать онлайн книгу Data Science For Dummies - Lillian Pierson страница 20
![Data Science For Dummies - Lillian Pierson Data Science For Dummies - Lillian Pierson](/cover_pre996274.jpg)
If you’ve been watching any news for the past decade, you’ve no doubt heard of a concept called machine learning — often referenced when reporters are covering stories on the newest amazing invention from artificial intelligence. In this chapter, you dip your toes into the area called machine learning, and in Part 3 you see how machine learning and data science are used to increase business profits.
Defining Machine Learning and Its Processes
Machine learning is the practice of applying algorithmic models to data over and over again so that your computer discovers hidden patterns or trends that you can use to make predictions. It’s also called algorithmic learning. Machine learning has a vast and ever-expanding assortment of use cases, including
Real-time Internet advertising
Internet marketing personalization
Internet search
Spam filtering
Recommendation engines
Natural language processing and sentiment analysis
Automatic facial recognition
Customer churn prediction
Credit score modeling
Survival analysis for mechanical equipment
Walking through the steps of the machine learning process
Three main steps are involved in machine learning: setup, learning, and application. Setup involves acquiring data, preprocessing it, selecting the most appropriate variables for the task at hand (called feature selection), and breaking the data into training and test datasets. You use the training data to train the model, and the test data to test the accuracy of the model’s predictions. The learning step involves model experimentation, training, building, and testing. The application step involves model deployment and prediction.
FIGURE 3-1: A example of a simple random sample
Becoming familiar with machine learning terms
Before diving too deeply into a discussion of machine learning methods, you need to know about the (sometimes confusing) vocabulary associated with the field. Because machine learning is an offshoot of both traditional statistics and computer science, it has adopted terms from both fields and added a few of its own. Here is what you need to know:
Instance: The same as a row (in a data table), an observation (in statistics), and a data point. Machine learning practitioners are also known to call an instance a case.
Feature: The same as a column or field (in a data table) and a variable (in statistics). In regression methods, a feature is also called an independent variable (IV).
Target variable: The same as a predictant or dependent variable (DV) in statistics.
Considering Learning Styles
Machine learning can be applied in three main styles: supervised, unsupervised, and semisupervised. Supervised and unsupervised methods are behind most modern machine learning applications, and semisupervised learning is an up-and-coming star.
Learning with supervised algorithms
Supervised learning algorithms require that input data has labeled features. These algorithms learn from known features of that data to produce an output model that successfully predicts labels for new incoming, unlabeled data points. You use supervised learning when you have a labeled dataset composed of historical values that are good predictors of future events. Use cases include survival analysis and fraud detection, among others. Logistic regression is a type of supervised learning algorithm, and you can read more on that topic in the next section.
Learning with unsupervised algorithms
Unsupervised learning algorithms accept unlabeled data and attempt to group observations into categories based on underlying similarities in input features, as shown in Figure 3-2. Principal component analysis, k-means clustering, and singular value