Semantic Web for Effective Healthcare Systems. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Semantic Web for Effective Healthcare Systems - Группа авторов страница 15
If a term ti is mapped to feature fa, fb ϵ F, then nearest terms, say ti-1 or ti+1 is considered to determine the feature category of term ti present in the document. In this case, the term is associated to a feature by comparing the fScore of each feature. The cumulative fScore is calculated for each feature f, and the resultant strong feature(s) is returned as the result. Cumulative fScore of feature f is determined by the sum of LDA scores of terms corresponding to the feature f.
For example, “the rooms are maintained neatly but the room rent is costly” is considered Type 3 query. Here, the terms rooms, room, and rent are extracted. The term room comes under both the topics “Infrastructure” and “Cost,” and the term rent comes under the topic “cost.” The term room needs to be fixed under a single topic (either infrastructure or cost). It is done by calculating the cumulative scores of features (or topics) under which the term occurs. Suppose, LDAscore(room, Infrastructure) = 0.17, LDAscore(room, Cost) = 0.38, and LDAscore(rent, Cost) = 0.26. Then cum_fScore(f = “Infrastructure”) is 0.17, and the cum_fScore(f = “Cost”) is 0.64. Since 0.64 > 0.17, the term room is assigned the feature as “Cost” in this context.
Type 4: Term not present in the Ontology
If given term is not present, new LDA score is computed for it and update-Ontology() is used for the new term. If ontoMap(t) is null, where t ϵ T, then the Ontology needs to be updated with the new term. CFSLDA modeling is done again for the Ontology update. The process of querying is repeated as one of the other three types described earlier.
1.5.4 Metrics Analysis
Accuracy is measured as a performance measure for the number of terms correctly identified under the topic (or features). However, the model is further evaluated by the metrics such as precision, recall, and f-measure. Precision defines how precise the model is, and recall defines how complete the model is. It is represented as a confusion matrix, which contains information about actual and predicted results, as shown in Table 1.1.
Table 1.1 Confusion matrix.
Predicted positive | Predicted negative | |
Actual positive | TP | FIN |
Actual negative | FP | TN |
where
TP: the number of correct classifications of the positive examples (true positive)
FN: the number of incorrect classifications of positive examples (false negative)
FP: the number of incorrect classifications of negative examples (false positive)
TN: the number of correct classifications of negative examples (true negative)
Precision is defined as the percentage of correctly identified documents among the documents returned; whereas recall is defined as the percentage of relevant results among the correctly identified documents. Practically, high recall is achieved at the expense of precision and vice versa [61]. However, the metric f-measure is suitable when single metric is needed to compare different models. It is defined as the harmonic mean of precision and recall. Based on the confusion matrix, the precision, the recall, and the f-measure of the positive class are defined as
Ontology-based Semantic Indexing (OnSI) model is evaluated by the metrics such as precision, recall, and accuracy, as shown in Equations 1.3, 1.4, 1.5, and 1.6.
1.6 Dataset Description
Document collection like Healthcare service reviews (Dataset DS1) were collected from different social media web sites for 10 different hospitals, as detailed in Tables 1.2 and 1.3.
Table 1.2 Social media reviews for healthcare service (DS1).
Number of reviews | ||
Data source | Positive | Negative |
1200 | 525 | |
Mouthshut.com | 425 | 110 |
BestHospitalAdvisor.com | 200 | 85 |
Google Reviews | 580 | 320 |
Total Reviews | 2405 | 1040 |
Table 1.3 Number of reviews of features and hospitals (DS1).
Features | Reviews |
Cost | 663 |
Medicare
|