Predicting Heart Failure. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Predicting Heart Failure - Группа авторов страница 14
Gharehchopogh et al. [26] presented a decision support model to help physicians in the study. The model, based on an ANN, was able to detect the presence or absence of HF with 85% accuracy. The study is within the scope of medical data mining.
Candelieri et al. developed a decision tree to determine patient stabilization [27] (with 88% accuracy). Pecchia et al. used decision tree techniques to classify patients into three severity groups using heart rate variability (HRV) measurements (HF: Healthy = 96% accuracy; Severe–Moderate = 79.3% accuracy) [28].
1.6 Machine Learning Supported Diagnosis
Invasive and non-invasive methods offer a wealth of diagnostic information. However, the interpretation of the available information can only be possible with the help of a physician. The increase in heart patients and the increase in patient data in parallel make it difficult to evaluate the data and extract information from them day by day. The intersection of the symptoms of heart disease with the symptoms of other diseases also makes the diagnosis of the disease a difficult problem. For this reason, there is a need to evaluate the data obtained with the help of invasive and non-invasive techniques with intelligent analysis tools in order to increase the diagnostic accuracy. Artificial intelligence and machine learning will assist physicians in intelligent analysis. Machine learning models trained with past patient data can be used to diagnose future cases. With the diagnostic capabilities to be gained by the machines, it will sometimes be possible for them to diagnose more precisely and more sensitively than the physicians. Supporting decision support systems working in expert systems logic with machine learning models will enable them to give better results.
1.6.1 Introduction to Machine Learning
Disease prediction and diagnosis can be made with the help of machine learning models. Disease diagnosis applications have been developed and used extensively, especially with controlled machine learning techniques. This technique has enabled models to be created from historical data and these models have sometimes been used in diagnosis and treatment. Developing a system based on machine learning is not just about developing machine learning algorithms, but rather it is done by working on data step by step from start to finish in a way similar to the data mining process. For example, determining which variables are important and which are not important in the solution of a problem directly affects the quality of the solution. This process, called feature selection, determines which parameters will be used in the system to be installed. The feature selection process is often achieved by establishing the correct relationships between targeted data and predictive data.
The feature selection phase is followed by feature transformation. Data transformation, which is a method that improves data quality, has recently emerged as feature engineering, which includes feature studies performed to increase prediction success. Like feature selection, feature engineering will also affect the success of the result. Both feature selection and feature engineering also solve the problem of high dimensionality in data. Loss of data and methods of combating loss are also important. The fight against lost values is sometimes carried out by estimating the lost value and sometimes by replacing it with other values.
It is also important to choose the learning models to be used when creating machine learning systems. There are four basic learning methods under the title of machine learning: supervised learning, unsupervised learning, semi-supervised learning, and reinforced learning. The latter two methods, semi-supervised learning is a type of learning that occurs with the combination of supervised and unsupervised techniques; reinforced learning is an agent-based learning technique where the decision is made according to the rewarding mechanism, and it is used to reveal the most possible solution to a subject that we did not have any previous knowledge about.
The most prominent techniques – supervised and unsupervised learning – will be explained in the following paragraphs.
1.6.2 Machine Learning Algorithms
There are many machine learning algorithms developed to date. To understand algorithms, the two learning methods of supervised learning and unsupervised learning (see Figure 1.2) need to be understood.
Figure 1.2 Supervised and unsupervised machine learning.
Supervised learning algorithms are often used in machine learning and perform two basic tasks: classification and regression. The output of classification operations is a nominal value, while the output of regression operations is usually a continuous value. Although supervised learning algorithms are often used to diagnose diseases, the field of machine learning is, as we have seen, wider and includes the additional methods we have mentioned, which are used for jobs where supervised learning is insufficient.
1.6.2.1 Supervised Learning
Supervised learning consists of two basic steps: creating a model with labeled data and testing with untagged data, the two prominent techniques in the category of supervised learning algorithms. Classification, one of the two prominent techniques in the category of supervised learning algorithms, is a supervised learning technique in which the target variable is of categorical type, while regression, the other prominent technique, is of the numerical type of the target variable. Operations are performed based on a model in which the target variable calculated from predictive variables is estimated. The purpose of classification is to assign records seen for the first time to one of the predefined categories. Identification and modeling of categories takes place with the help of training data. Training data and machine learning algorithms come together to form machine learning models. Machine learning models also match the records to the classroom that suits them best.The most important feature that distinguishes supervised learning from unsupervised learning is label information in supervised data. It is the class label in the data that provides the control. Although the output of both methods is different, the goal is to estimate the value of the output variable based on input variables.
1.6.2.1.1 Decision Trees
One of the prominent algorithms related to the classification task is the decision tree. The model presented with the tree data structure is learned directly from the data. Through the tree induction process, the characteristics of the training data are processed on the tree. The decision tree algorithm we often use and are used to seeing is the C4.5 algorithm [29]. Since the C4.5 algorithm is suitable for working with both numerical and categorical input variables, it can be used in many data sets.
1.6.2.1.2 Naive Bayes
The Naive Bayes classifier is a probability-based classifier. It is an algorithm that tries to find the final probabilities P (Cj | A) of the test data with the help of the preliminary probabilities P (A | Cj) learned from the training data. The algorithm is based on the Bayes’ theorem [30]. According to Bayes’ theorem, events are interrelated and there is a relationship between probabilities P (A | C) and P (C | A), P (A), P (C). Therefore, while calculating the value of P (A | C) with the help of Bayes,