Machine Learning Approach for Cloud Data Analytics in IoT. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Machine Learning Approach for Cloud Data Analytics in IoT - Группа авторов страница 28
Figure 3.1 Classification of big data analytics.
Section 3.1 of the chapter briefly introduces the concept of predictive data analytics and its requirements in the retail industry. Various approaches of predictive data analytics have also been mentioned in this section. Background and related work has been elaborated in Section 3.2. The predictive data analytics in the retail industry has been discussed in Section 3.3. It also presents various models for predictive data analytics using ML. Associated challenges and use cases have also been discussed in this section. Authors attempt to propose a framework for predictive data analytics in Section 3.4. Finally, conclusion and future direction for research has been presented in Section 3.5.
3.2 Related Work
This section presents the background and related work of ML in the context of retail industries. The employment of ML in retail industries has started since its inception [9]. However, the emergence in ML has further boosted its employment in this domain during the past decade. The major employment of ML approaches is for prediction of sales, revenue and stock requirement in the retail industry. Authors in [4] established that the predictive model is generally suitable for estimating and predicting future observations and assessing their predictability levels.
Authors in [10] determined the possibilities of integrating big data in the retail industry. Authors discussed the usage of Bayesian approach and predictive analytics in the context of the retail industry. Authors also mentioned the concern for data privacy in retail industry. This research was taken one step ahead by authors in [11] who have used different ML techniques to predict the sales. In [11], it is observed that the normal regression techniques when integrated with boosting techniques have observed better results in comparison to mere regression algorithms. Using the same principle, authors in [12] also used ML approaches to predict future sales using the historical sales data. Authors discussed various approaches for the sales prediction and finally concluded that gradient boost algorithm is the best fit model for this scenario as it achieves highest accuracy and efficiency. The authors in [13] also implemented a stacking approach for regression ensemble to further improve prediction for sales. Authors in [14] proposed a model using ML techniques to optimize pricing on a daily basis. All these predictive models can be employed to predict demand and sales of products in future. Authors in [15] presented a regression model using regression trees for each department to predict future demand. The proposed model is authenticated in terms of its efficiency using least squares regression, principal components regression, and other similar regressions. Similarly, authors in [15] used historical data and Rue La La’s expertise for building size curves for each product p which represents the percentage of product demand for each size of p. Here, authors also attempted a price optimization problem with an object to set a prime for each product so as to maximize the revenue.
Authors in [16] proposed a framework to perform requirement analysis in the retail industry. The proposed framework consists of three modeling views: business view, analytics design view, and data preparation view. These views collectively perform data preparation activities. The authors in [17] employed descriptive analytics in relation to data mining for decision-making. Here, it is worth mentioning that predictive data analytics employs deterministic optimization techniques such as the decision tree method.
3.3 Predictive Data Analytics in Retail
Each retail industry aims to devise attractive and efficient business strategies to lure the largest portion of customers. For the past few years, retail industries had been using historical data to frame business strategies [18]. Focusing on mere historical transaction data fails to give promising results in this rapidly evolving and competing business world involving huge ocean of data [19]. This inability is addressed using predictive data analytics, an efficient approach to use big data to predict the activity, behavior, and future trends for any enterprise. Further, predictive data analytics is required owing to exponential rise in data and cut-throat competition. Predictive data analytics also helps to obtain a thorough understanding of customers, budget, and stock. As a result, predictive data analytics has gained wide acceptance and attracted several researchers and academicians. Predictive data analytics fails to achieve the desired results using simple regression type methods as it is not suitable in this multidimensional environment. Hence, it employs ML to enhance its capability [20]. The following are the most prevailing models for predictive data analytics [14]:
Classification Model
Clustering Model
Outliers Model
Time Series Model
The readers may refer to [14] for the explanation of these models. All these models use common predictive algorithms. The various predictive algorithms can be broadly categorized into two groups, viz., ML and deep learning. ML primarily works for tabular data which may be linear or nonlinear. Basically, deep learning is also a subset of ML but it has better optimization when dealing with audio, text, and images. ML-based predictive modeling uses various algorithms. Some common algorithms are discussed below in brief [21].
Random Forest: It is the most popular classification and regression algorithm of ML capable of handling huge volumes of data. Random forest implements bagging where a subset of training data is used to train the network. Training process may be repeated with another subset in parallel thus achieving a strong learner.
Generalized Linear Model (GLM): This model narrows down the list of variables and thus performs better than the general linear model. As a result of narrowing down the variables, it gets trained quickly. The limitation of this model is that it requires relatively huge training data sets.
Gradient Boosted Model (GBM): it generates a model that uses decision trees for classification. In this approach, each tree rectifies errors present in previously trained tree. As it builds one tree at a time, it takes longer but gives better generalizations. Hence, it is used in ML-based ranking in Yahoo, among others.
K-Means: It is a popular and fast algorithm to classify data points in various groups so that all points in the same group are highly similar. The aim of this classification is that intragroup similarity is maximized and intergroup similarity is minimized.
Owing to abovementioned algorithms, ML has been widely accepted and recognized as an efficient choice for handling huge volumes of data in the retail industry. It enables sophisticated algorithms for customers’ understanding and thus provides customer-oriented shopping experience. The subsequent subsection discusses the employment of ML for predictive data analytics in the retail industry.
3.3.1 ML for Predictive Data Analytics
As mentioned earlier, ML has been accepted as an efficient and effective choice for predictive