Big Data. Seifedine Kadry
Чтение книги онлайн.
Читать онлайн книгу Big Data - Seifedine Kadry страница 14
Data storage
Data privacy
1.9.2 Heterogeneity and Incompleteness
The data types of big data are heterogeneous in nature as the data is integrated from multiple sources and hence has to be carefully structured and presented as homogenous data before big data analysis. The data gathered may be incomplete, making the analysis much more complicated. Consider an example of a patient online health record with his name, occupation, birth data, medical ailment, laboratory test results, and previous medical history. If one or more of the above details are missing in multiple records, the analysis cannot be performed as it may not turn out to be valuable. In some scenarios a NULL value may be inserted in the place of missing values, and the analysis may be performed if that particular value does not have a great impact on the analysis and if the rest of the available values are sufficient to produce a valuable outcome.
1.9.3 Volume and Velocity of the Data
Managing the massive and ever increasing volume of big data is the biggest concern in the big data era. In the past, the increase in the data volume was handled by appending additional memory units and computer resources. But the data volume was increasing exponentially, which could not be handled by traditional existing database storage models. The larger the volume of data, the longer the time consumed for processing and analysis.
The challenge faced with velocity does not only mean rate at which data arrives from multiple sources but also the rate at which data has to be processed and analyzed in the case of real‐time analysis. For example, in the case of credit card transactions, if fraudulent activity is suspected, the transaction has to be declined in real time.
1.9.4 Data Storage
The volume of data contributed by social media, mobile Internet, online retailers, and so forth, is massive and was beyond the handling capacity of traditional databases. This requires a storage mechanism that is highly scalable to meet the increasing demand. The storage mechanism should be capable of accommodating the growing data, which is complex in nature. When the data volume is previously known, the storage capacity required is predetermined. But in case of streaming data, the required storage capacity is not predetermined. Hence, a storage mechanism capable of accommodating this streaming data is required. Data storage should be reliable and fault tolerant as well.
Data stored has to be retrieved at a later point in time. This data may be purchase history of a customer, previous releases of a magazine, employee details of a company, twitter feeds, images captured by a satellite, patient records in a hospital, financial transactions of a bank customer, and so forth. When a business analyst has to evaluate the improvement of sales of a company, she has to compare the sales of the current year with the previous year. Hence, data has to be stored and retrieved to perform the analysis.
1.9.5 Data Privacy
Privacy of the data is yet another concern growing with the increase in data volume. Inappropriate access to personal data, EHRs, and financial transactions is a social problem affecting the privacy of the users to a great extent. The data has to be shared limiting the extent of data disclosure and ensuring that the data shared is sufficient to extract business knowledge from it. Whom access to the data should be granted to, limit of access to the data, and when the data can be accessed should be predetermined to ensure that the data is protected. Hence, there should be a deliberate access control to the data in various stages of the big data life cycle, namely data collection, storage, and management and analysis. The research on big data cannot be performed without the actual data, and consequently the issue of data openness and sharing is crucial. Data sharing is tightly coupled with data privacy and security. Big data service providers hand over huge data to the professionals for analysis, which may affect data privacy. Financial transactions contain the details of business processes and credit card details. Such kind of sensitive information should be protected well before delivering the data for analysis.
1.10 Big Data Applications
Banking and Securities – Credit/debit card fraud detection, warning for securities fraud, credit risk reporting, customer data analytics.
Healthcare sector – Storing the patient data and analyzing the data to detect various medical ailments at an early stage.
Marketing – Analyzing customer purchase history to reach the right customers in order market their newly launched products.
Web analysis – Social media data, data from search engines, and so forth, are analyzed to broadcast advertisements based on their interests.
Call center analytics – Big data technology is used to identify the recurring problems and staff behavior patterns by capturing and processing the call content.
Agriculture–Sensors are used by biotechnology firms to optimize crop efficiency. Big data technology is used in analyzing the sensor data.
Smartphones—Facial recognition feature of smart phones is used to unlock their phones, retrieve information about a person with the information previously stored in their smartphones.
1.11 Big Data Use Cases
1.11.1 Health Care
To cope up with the massive flood of information generated at a high velocity, medical institutions are looking around for a breakthrough to handle this digital flood to aid them to enhance their health care services and create a successful business model. Health care executives believe adopting innovative business technologies will reduce the cost incurred by the patients for health care and help them provide finer quality medical services. But the challenges in integrating patient data that are so large and complex growing at a faster rate hampers their efforts in improving clinical performance and converting the assets to business value.
Hadoop, the framework of big data, plays a major role in health care making big data storage and processing less expensive and highly available, giving more insight to the doctors. It has become possible with the advent of big data technologies that doctors can monitor the health of the patients who reside in a place that is remote from the hospital by making the patients wear watch‐like devices. The devices will send reports of the health of the patients, and when any issue arises or if patients’ health deteriorates, it automatically alerts the doctor.
With the development of health care information technology, the patient data can be electronically captured, stored, and moved across the universe, and health care can be provided with increased efficiency in diagnosing and treating the patient and tremendously improved quality of service. Health care in recent trend is evidence based, which means analyzing the patient’s healthcare records from heterogeneous sources such as EHR, clinical text, biomedical signals, sensing data, biomedical images, and genomic data and inferring the patient’s health from the analysis. The biggest challenge in health care is to store, access, organize, validate, and analyze this massive and complex data; also the challenge is even bigger for processing the data generated at an ever increasing speed. The need for real‐time and computationally intensive analysis of patient data generated from ICU is also increasing. Big data technologies have evolved as a solution for the critical issues in health care, which provides real‐time solutions and deploy advanced health care facilities. The major benefits of big data in health care are preventing disease, identifying modifiable risk factors, and preventing