Predicting Heart Failure. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Predicting Heart Failure - Группа авторов страница 15
1.6.2.1.3 Support Vector Machines
SVMs were first introduced by Vapnik [31]. The technique uses what we call support vectors to distinguish between data points belonging to different classes. The method aims to find the hyperplane that will best distinguish (margin maximization) different classes from each other. In its simplest form, it distinguishes two-class spaces from each other with the help of two equations wTx + b = + 1 and wTx + b = -1. SVMs were first developed in accordance with linear classification and, later, kernel functions for nonlinear spaces were developed. Kernel functions express a transformation between linear and nonlinear spaces. There are types such as linear, polynomial, radial basis function, and sigmoid. Depending on the nature of the data used, kernel functions can be superior to each other.
1.6.2.1.4 K-Nearest Neighbor
The k-nearest neighbor (k-NN) algorithm is a distance-based classifier, which looks at the neighbors of the data point to classify a data object whose class is unknown. A majority vote is made for the classification decision. The two prominent parameters for the algorithm are the k (neighbor) number and the distance (distance) function. There is no exact method for determining the number of neighbors, so the ideal k value is often found after trials. The cosine similarity, Manhattan, Euclidian, or Chebyshev distance is used as the distance function. One of the problems with the k-NN algorithm is the scale problem. When the method based on operating in geometric space gives a scale problem, the problem is solved by feature engineering.
1.6.2.1.5 Neural Nets
An ANN is a machine learning method that emulates human learning. ANNs, which are frequently used in classification problems, are also used in clustering and optimization processes. Although the simplest neural network model is perceptron, multilayer perceptron is often used in classification problems. Deep learning methods, which have been used in many important tasks recently, are based on ANNs. The adaptability and parallel processing capability of ANNs make them a powerful option for many problems.
1.6.2.2 Unsupervised Learning
Unsupervised learning works with untagged data and its purpose is to create clusters based on the characteristics of the data. Unlike supervised learning, untagged data is used instead of labeled data. After the data are divided into groups according to their similarity or distance, labeling is done with the help of an expert. Two applications that stand out in unsupervised learning are clustering and association rule mining. Clustering is the assignment of data points to groups called clusters. It has two types: partitioned and hierarchical methods. In partitioned clustering, a data point can only be in one cluster. In hierarchical clustering, a point can be hierarchically located in more than one cluster. In association rules mining, association rules focused on finding rules based on relationships between events are used in mining relationships between attributes.
1.6.2.2.1 K-Means
The K-means segmentation clustering algorithm was first developed in 1967 by MacQueen [32]. The purpose of the algorithm is to divide the data into K clusters. Each cluster is presented with a center of gravity named centroid. The K value is determined by the user. An iterative method is used to divide the data into clusters. The distance function obtains the point to which each data point will be assigned.
1.6.2.2.2 Apriori Algorithm
The apriori algorithm is the prominent algorithm in association rules mining. It finds common patterns in the transaction database and performs rule generation [33]. With the help of the obtained rules, the occurrence of another event can be predicted after an event occurs. It is a frequently preferred algorithm especially because it helps to establish relationships between events.
1.6.3 Machine Learning Supported HF Studies
Machine learning, the most common application of artificial intelligence, reveals patterns in data by continuously improving the ability to learn from data and the prediction and diagnosis of cardiovascular disease [34]. When the machine learning based diagnosis system of HF is considered as input, process, and output modules, the modules can be presented as follows. The input module contains data to be used by the decision support system, such as physical examination data, laboratory results, clinical data, ECG monitoring data, and electrocardiography data. The transaction module is the module that contains machine learning algorithms, which are mainly supervised and unsupervised learning algorithms. In diagnosing HF the machine learning algorithms currently used include nearest neighbor, self-organizing maps, multilayer perceptron, classification and regression trees, random forests, SVMs, neural networks, logistic regression, decision trees, clustering, and fuzzy-genetic and neuro-fuzzy expert systems. In the output module, information such as the presence of HF, risk of HF events, evaluation of left ventricular deterioration, response to advanced therapies, and risk of death is attempted to be determined.
When the literature on machine learning methods (Table 1.2), which is an important option in diagnosing HF, is examined, it will be seen that the use of HRV stands out in many studies. In one of the case studies, Yang et al. [35] used a scoring method to diagnose HF. In the study, with the help of two SVM models, it was first checked whether the person has HF. If the result was normal, the second SVM model came into play and classified the person being examined as healthy or prone to HF. The scores were matched with the SVM model outputs and diagnostic outputs were obtained according to the score ranges.
The aim of the study by Son et al. [36] was to distinguish between CHF and shortness of breath problems. The study was initially made with 72 features; rough sets and logistic regression techniques were used to reduce the number of variables. The accuracy of the classification obtained according to the features selected with the help of coarse clusters was 97.5%, and the classification accuracy obtained with the features selected based on logistic regression was measured as 88.7%.
Masetic et al. 2016 [37] applied the random forest algorithm to ECG time series to detect CHF. The features on the ECG were extracted using the autoregressive Burg method. In the study, apart from the random forest algorithm, C4.5, SVM, ANN, and k-NN classifiers were used with the random forest algorithm giving the best performance.
Wu et al. [38] studied detecting HF prior to clinical diagnosis. Information such as electronic health records, health behavior, demographic data, clinical diagnosis, and clinical precautions were used to detect the disease in advance. SVM, boosting, and logistic regression were used for early detection of the disease. In addition, the contribution of feature selection to success was observed.
Aljaaf et al. [39] proposed a multilevel risk assessment for developing HF. With the help of the C4.5 classifier, estimates were made according to five different risk levels (1: No risk; 2: Low risk; 3: Moderate risk; 4: High risk; 5: Extremely high risk). The Cleveland heart disease data set was used in the study. A 10-fold cross-validation procedure was followed to evaluate the C4.5 classifier.
Zheng et al. [40] proposed a computer-aided diagnostic system for diagnosing HF. This system uses least-squares SVM (LS-SVM). The LS-SVM classifier gave better results than neural nets and hidden Markov models.
Pattekari et al. [41] designed a Naive Bayes-based smart system and developed a decision support system for HF prediction. With the web-based