Читать онлайн книгу - Data Analytics in Bioinformatics. Группа авторов. Программы. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Data Analytics in Bioinformatics - Группа авторов

Скачать книгу

in Refs. [51–53]. In the process of regression analysis on the heart disease dataset, the following numerical interpretation is obtained and presented in Table 1.1.

Where,

Multiple R (Co-relation Coefficient): It depicts the strength of a linear relationship between two variables i.e. age and cholesterol of a human. This value always lies between −1 and +1. The obtained value i.e. 0.972834634 indicated that there is a good relationship between age and cholesterol level.

R2: It is the coefficient of determination i.e. the goodness of fit. The obtained value is 0.946407225 which indicates that 95% of the values of the heart disease dataset fit the regression model.

Adjusted R2: This variable is an upgraded version of R2. This value tries to adjust the predictor number in the model. This value increases when any new term improves the performance of the model more than the expectation and viceversa. The obtained value i.e. 0.945430663 indicates that the model is not performing well so there is a need for modification in predictor number.

Standard Error: It measures the precision of the regression model, the smaller the number, the more accurate the results are. The value obtained is 12.7814549 which indicates that the results are near to accurate value. The Standard Error depicts the measure of how well the data has been approximated.

Table 1.1 Regression statistics.

Regression Statistics
Multiple R	0.972834634
R Square	0.946407225
Adjusted R Square	0.945430663
Standard Error	12.7814549
Observations	1,025

1.4.1 Logistic Regression

Logistic Regression is a statistical model used for identifying the probability of a class with the help of binary dependent variables i.e. Yes or No. It indicates whether a class belongs to the Yes category or the No category. For example, after executing an event on an object the results maybe Win or Loss, Pass or Fail, Accept or Not-Accept, etc. The mathematical representation of the Logistic Regression model is done by two indicator variables i.e. 0 and 1. It is different from the Linear Regression technique as depicted in Ref. [54]. As logistic regression has its importance in the real-life classification problems as depicted in Refs. [55, 56], different fields like Medical Sciences, Social Sciences, ML are using this model in their various field of operations.

The Logistic Regression is performed on the heart disease dataset [41]. The Receiver Operating Characteristics (ROC) is calculated that is based on the true positive rate that is plotted on the y-axis and the false positive rate that is plotted on the x-axis. After performing the logistic regression in python (Google Colab), the outcome is represented in Figure 1.11 and Table 1.2. Figure 1.11 represents the ROC curve and Table 1.2 represents the Area under the ROC Curve (AUC).

At the time of processing, the AUC value obtained (Table 1.2) on training data is 0.8374022, but when the data is processed for testing then the obtained result is outstanding (i.e. 0.9409523). This indicates that the model is more than 90% efficient for classification. In the next section, the difference between Linear and Logistic Regression is discussed.

Figure 1.11 ROC curve for logistic regression.

Table 1.2 AUC: Logistic regression.

Parameter	Data	Value	Result
The area under	Training Data	0.8374022	Excellent
the ROC Curve (AUC)	Test Data	0.9409523	Outstanding
	Index: 0.5: No Discriminant, 0.6–0.8: Can be considered accepted, 0.8–0.9: Excellent, >0.9: Outstanding

1.4.2 Difference between Linear & Logistic Regression

Linear and Logistics regression are two common types of regression used for prediction. The result of the prediction is represented with the help of numeric variables. The difference between linear and logistic regression is depicted in Table 1.3 for easy understanding.

Linear regression is used to model the data by using a straight line whereas the logistic regression deals with the modeling of probability of events in a bi-variate manner that is occurring as a linear function of dependent variables. Few other types of regression analysis are depicted by different scientists and listed below.

Table 1.3 Difference between linear & logistic regression.

S. No.	Parameter	Linear regression	Logistic regression
1	Purpose	Used for solving regression problems.	Used for solving classification problems.
2	Variables Involved	Continuous Variables	Categorical Variables
3	Objective	Finding of best-fit-line and predicting the output.	Finding of s-curve and classifying the samples.
4	Output	Continuous Variables such as age, price, etc.	Categorical Values such as 0 & 1, Yes & No.
5	Collinearity	There may be collinearity between independent attributes.	There should not be collinearity between independent attributes.
6	Relationship	The relationship between a dependent variable and the independent variable must be linear.	The relationship between a dependent variable and an independent variable Скачать книгу В начало < 10 11 12 13 14 15 16 17 18 19 > В конец e-mail: [email protected]

Data Analytics in Bioinformatics. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Data Analytics in Bioinformatics - Группа авторов страница 15

Информация о книге:

1.4.1 Logistic Regression

1.4.2 Difference between Linear & Logistic Regression