Читать онлайн книгу - Semantic Web for Effective Healthcare Systems. Группа авторов. Программы. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Semantic Web for Effective Healthcare Systems - Группа авторов

Скачать книгу

Φ_tw C 0.01770.0042 0.00040.1415 0.12470.1429 0.00040.1502 0.00050.2238 Bill Φ_tw C 0.00040.0468 0.0265-0.0015 0.00010.1614 0.01210.1176 0.00050.2111

Table 1.6 shows the list of feature terms selected by the CFS_LDA model. Among the pre-processed and PoS tagged nouns, 68 terms are selected for the topic “cost,” 110 for “medicare,” 112 for “staff,” 101 for “infrastructure,” and 73 for “time.”

Table 1.6 List of correlated feature terms selected by CFS_LDA model.

Features of DS1	Number of terms selected by CFSLDA	Correlated feature terms by CFSLDA model (DS1)
Cost	68	cost, test, money, charge, day, case, department, patient, room, pay, bill, ...
Medicare	110	doctor, discharge, medicine, treatment, appointment, admission, disease, option, pain, reply, duty, test, meeting, …
Staff	112	staff, patient, medicine, problem, report, manner, management, treatment, complaints, …
Infrastructure	101	hospital, room, facility, meals, rate, …
Time	73	time, service, hour, operation, day, bill, ...

Thus, the feature extraction process model, the CFS_LDA model, selects not only the terms with high term-topic probability value but also the terms which are highly correlated with the topmost term under each topic, which are contextually equivalent. The terms which are positively correlated with the top probable term are selected for the list. The topic name, the set of terms, and their LDA score are given to the Ontology builder tool for repository. Figure 1.11 shows the spring view of domain Ontology built for Healthcare service reviews (DS1).

Schematic illustration of spring view of domain ontology.

Figure 1.11 Spring view of domain ontology (DS1).

Figure 1.12 shows the precision vs recall value curve and f-measure value when OnSI model is used on different query documents. It shows that recall improves continuously for higher number of query documents. When the document size is 500, OnSI model gives precision 0.61, recall 0.53, and F-measure 0.57 values.

Schematic illustration of precision versus recall curve for the Dataset DS1.

Figure 1.12 Precision vs recall curve for the Dataset DS1.

1.7.1 Discussion 1

Ontology querying involves direct extract of feature from its repository instead of doing similarity measures as other techniques like Naive Bayes algorithm do. The similarity between the terms is incorporated into the model during the CFS_LDA modelling technique itself rather than the querying phase. It is very much required as the lexicon-based indexing technique just uses the keywords with one-to-one mapping and it does not look for synonymous terms and contextually related terms. OnSI model retrieves these types of terms from the document collection for the features or topics, which in turn improves the recall value. 100% accuracy may not be attained sometimes, as some of the terms present in query documents may not be present in the Ontology and it may need to be updated. In the next iteration, the value gets improved.

1.7.2 Discussion 2

Time taken for modeling and querying (training and testing) text documents are measured. OnSI model takes 2.1 s for modeling (70% of DS1) and 0.35 s for querying (30% of DS1). The use of Sparql language in OnSI model greatly reduces the time for query processing. The time complexity of querying depends on the number of terms present in the documents and number of times each term conflicts with other topics. OnSI model of extracting features were compared with Naive Bayes classifier and k-Means clustering techniques, and the results are shown in Table 1.7.

Table 1.7 Performance evaluation.

Technique	Recall	Accuracy	Time
Naive Bayes Classifier	30%	69%	3.98 s
k-Means Clustering	37%	79%	4.25 s
OnSI (Ontology-based CFS_LDA)	57%	88%	2.45 s

1.7.3 Discussion 3

Generally, the term-document (TD) matrix is stored in .csv format which takes megabytes of storage whereas the .owl format, the Ontology file, takes only kilo bytes of storage. For example, size of .csv file was 3.5 MB (approx.) when review documents were converted into TD matrix for the dataset DS1. Each review document consumes 1 kB (approx.) storage and also it depends on the number of terms present in the dataset. However, DS1 takes only 360 kB (approx.) when .owl format is used.

This chapter focused on building of Ontology for the contextual representation of user-generated content, i.e., the review documents. The contextually aligned documents are represented in domain Ontology along with the semantics using the Ontology-based Semantic Indexing (OnSI) model.

It has been identified that the modeling of documents greatly impacts the query processing time and its recall value. The OnSI model improves the recall value by 27% and reduces the time by 1.53 s, when compared Naïve Bayes technique. Similarly, it improves the recall value by 20% and reduces the time by 1.8 s, when compared with k-means algorithm. The LDA parameters and the

Скачать книгу

Semantic Web for Effective Healthcare Systems. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Semantic Web for Effective Healthcare Systems - Группа авторов страница 17

Информация о книге:

1.7.1 Discussion 1

1.7.2 Discussion 2

1.7.3 Discussion 3