Semantic Web for Effective Healthcare Systems. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Semantic Web for Effective Healthcare Systems - Группа авторов страница 17
Table 1.6 shows the list of feature terms selected by the CFSLDA model. Among the pre-processed and PoS tagged nouns, 68 terms are selected for the topic “cost,” 110 for “medicare,” 112 for “staff,” 101 for “infrastructure,” and 73 for “time.”
Table 1.6 List of correlated feature terms selected by CFSLDA model.
Features of DS1 | Number of terms selected by CFSLDA | Correlated feature terms by CFSLDA model (DS1) |
Cost | 68 | cost, test, money, charge, day, case, department, patient, room, pay, bill, ... |
Medicare | 110 | doctor, discharge, medicine, treatment, appointment, admission, disease, option, pain, reply, duty, test, meeting, … |
Staff | 112 | staff, patient, medicine, problem, report, manner, management, treatment, complaints, … |
Infrastructure | 101 | hospital, room, facility, meals, rate, … |
Time | 73 | time, service, hour, operation, day, bill, ... |
Thus, the feature extraction process model, the CFSLDA model, selects not only the terms with high term-topic probability value but also the terms which are highly correlated with the topmost term under each topic, which are contextually equivalent. The terms which are positively correlated with the top probable term are selected for the list. The topic name, the set of terms, and their LDA score are given to the Ontology builder tool for repository. Figure 1.11 shows the spring view of domain Ontology built for Healthcare service reviews (DS1).
Figure 1.11 Spring view of domain ontology (DS1).
Figure 1.12 shows the precision vs recall value curve and f-measure value when OnSI model is used on different query documents. It shows that recall improves continuously for higher number of query documents. When the document size is 500, OnSI model gives precision 0.61, recall 0.53, and F-measure 0.57 values.
Figure 1.12 Precision vs recall curve for the Dataset DS1.
1.7.1 Discussion 1
Ontology querying involves direct extract of feature from its repository instead of doing similarity measures as other techniques like Naive Bayes algorithm do. The similarity between the terms is incorporated into the model during the CFSLDA modelling technique itself rather than the querying phase. It is very much required as the lexicon-based indexing technique just uses the keywords with one-to-one mapping and it does not look for synonymous terms and contextually related terms. OnSI model retrieves these types of terms from the document collection for the features or topics, which in turn improves the recall value. 100% accuracy may not be attained sometimes, as some of the terms present in query documents may not be present in the Ontology and it may need to be updated. In the next iteration, the value gets improved.
1.7.2 Discussion 2
Time taken for modeling and querying (training and testing) text documents are measured. OnSI model takes 2.1 s for modeling (70% of DS1) and 0.35 s for querying (30% of DS1). The use of Sparql language in OnSI model greatly reduces the time for query processing. The time complexity of querying depends on the number of terms present in the documents and number of times each term conflicts with other topics. OnSI model of extracting features were compared with Naive Bayes classifier and k-Means clustering techniques, and the results are shown in Table 1.7.
Table 1.7 Performance evaluation.
Technique | Recall | Accuracy | Time |
Naive Bayes Classifier | 30% | 69% | 3.98 s |
k-Means Clustering | 37% | 79% | 4.25 s |
OnSI (Ontology-based CFSLDA) | 57% | 88% | 2.45 s |
1.7.3 Discussion 3
Generally, the term-document (TD) matrix is stored in .csv format which takes megabytes of storage whereas the .owl format, the Ontology file, takes only kilo bytes of storage. For example, size of .csv file was 3.5 MB (approx.) when review documents were converted into TD matrix for the dataset DS1. Each review document consumes 1 kB (approx.) storage and also it depends on the number of terms present in the dataset. However, DS1 takes only 360 kB (approx.) when .owl format is used.
This chapter focused on building of Ontology for the contextual representation of user-generated content, i.e., the review documents. The contextually aligned documents are represented in domain Ontology along with the semantics using the Ontology-based Semantic Indexing (OnSI) model.
It has been identified that the modeling of documents greatly impacts the query processing time and its recall value. The OnSI model improves the recall value by 27% and reduces the time by 1.53 s, when compared Naïve Bayes technique. Similarly, it improves the recall value by 20% and reduces the time by 1.8 s, when compared with k-means algorithm. The LDA parameters and the