Computational Statistics in Data Science. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 51
IBM InfoSphere Streams can handle millions of messages or events in a second with high throughput rates, making it one of the leading proprietary solutions for real‐time applications [61]. Apama Stream Analytics is suitable for real‐time and high‐volume business operations [62]. Azure Stream is another proprietary solution for driving streaming analytics and IoT goals [62]. Other reasonable proprietary solutions include Kinesis, PieSync, TIBCO Spotfire, Google Cloud Pub/Sub, Azure Event Hubs, Kibana, Amazon Elastic Search Service, and Kibana.
In an ideal case, choosing a single streaming data technology that supports all the system requirements such as the state of data, use case, and kind of results seems the best as this alleviates the problems of interoperability constraints.
9 Conclusion and the Way Forward
In this chapter, we have considered cutting‐edge issues concerning data stream or streaming data. The interest in stream processing is on the increase, and data must be handled quickly to make decisions in real‐time. The key presumption of stream computing is that the likelihood estimation of data lies in its newness. Thus, data analysis is done the moment they arrive in a stream instead of what is obtained in batch processing where data are first stored before they are explored. Challenges for data stream analysis include concept drift, scalability, integration, fault tolerance, timeliness, consistency, heterogeneity and incompleteness, load balancing, privacy issues, and accuracy [27, 28, 30–32, 34, 35], which emerges from the nature of data streams.
Streaming is an active research area. However, there are still some aspects of streaming that have received little attention. One of them is transactional guarantees. Current stream processing can provide basic guarantees such as processing each data point in the stream exactly once or at least once but cannot provide guarantees that span multiple operations or stream elements. Another area to intensify research effort is data stream pre‐processing. Data quality is a vital determinant in the knowledge discovery pipeline as low‐quality data yields low‐quality models and choices [69]. There is need to reinforce data stream pre‐processing stage [67] in the face of multi‐label [70], imbalance [71], and multi‐instance [72] problems associated data stream [66]. Also, the representation of social media posts must be such that the semantics of social media content is preserved [74, 75]. Moreover, data stream pre‐processing techniques with low computational requirement [73] need to be evolved as this is still open for research.
Data stream processing requires two factors which include storage capability and computational power in the face of an unbounded generation of data with high velocity and brief life span. To cope with these requirements, approximate computing, which aims at low latency at the expense of acceptable quality loss, has been a practical solution [110]. Even though approximate computing has been extensively used for the processing of data stream, combining it with distributed processing models brings new research directions. Such research directions include approximation with heterogeneous resources, pricing models with approximation, intelligent data processing, and energy‐aware approximation.
References
1 1 World Economic Forum (2019) How Much Data is Generated Each Day? Visual Capitalist, https://www.visualcapitalist.com/how‐much‐data‐is‐generated‐each‐day.
2 2 Huynh, V. and Phung, D. (2017) Streaming clustering with Bayesian nonparametric models. Neurocomputing, 258, 52–62. doi: 10.1016/j.neucom.2017.02.078.
3 3 Ray, I., Adaikkalavan, R., Xie, X., and Gamble, R. (2015) Stream Processing with Secure Information Flow Constraints. 29th IFIP Annual Conference on Data and Applications Security and Privacy. Fairfax, USA, pp. 311–329. doi: 10.1007/978‐3‐319‐20810‐7_22.
4 4 Sibai, R.E., Chabchoub, Y., Demerjian, J. et al. (2016) Sampling Algorithms in Data Stream Environment. 2016 International Conference on Digital Economy Carthage. IEEE, Tunisia, pp. 29–36. doi: 10.1109/ICDEC.2016.7563142.
5 5 Youn, J., Shim, J., and Lee, S.G. (2018) Efficient data stream clustering with sliding windows based on locality sensitive hashing. IEEE Access, 6, 63757–63776. doi: 10.1109/ACCESS.2018.2877138.
6 6 Das, S., Beheraa, R.K., Kumar, M., and Rath, S.K. (2018) Real‐time sentiment analysis of twitter streaming data for stock prediction. Procedia Comput. Sci., 132, 956–964.
7 7 Wang, J., Zhu, R., and Liu, S. (2018) A differentially private unscented Kalman filter for streaming data in IoT. IEEE Access, 6 (1), 6487–6495. doi: 10.1109/ACCESS.2018.2797159.
8 8 Kolchinsky, I. and Schuster, A. (2019) Real‐Time Multi‐Pattern Detection Over Event Streams. Proceedings of the 2019 International Conference on Management of Data, Amsterdam Netherlands: New York, NY, USA: ACM, pp. 589–606. doi: 10.1145/3299869.3319869.
9 9 Tozi, C. (2017) Dummy's Guide to Batch vs Streaming. Retrieved from Trillium Software, https://www.precisely.com/blog/big‐data/big‐data‐101‐batch‐stream‐processing.
10 10 Kolajo, T., Daramola, O., and Adebiyi, A. (2019) Big data stream analysis: a systematic literature review. J. Big Data, 6, 47.
11 11 Kusumakumari, V., Sherigar, D., Chandran, R., and Patil, N. (2017) Frequent pattern mining on stream data using Hadoop CanTree‐GTree. Procedia Comput. Sci., 115, 266–273.
12 12 Giustozzia, F., Sauniera, J., and Zanni‐Merk, C. (2019) Abnormal situations interpretation in industry 4.0 using stream reasoning. Procedia Comput. Sci., 159, 620–629.
13 13 Liu, R., Li, Q., Li, F. et al. (2014) Big Data Architecture for IT Incident Management. Proceedings of IEEE international conference on service operations and logistics, and informatics. Qingdao, China, pp. 424–429.
14 14 Sakr, S. (2013) An Introduction to Infosphere Streams: A Platform for Analyzing Big Data in Motion, IBM, https://www.ibm.com/developerworks/library/bd‐streamsintro/index.html.
15 15 Inoubli, W., Aridhi, S., Mezni, H. et al. (2018) An experimental survey on big data frameworks. Future Gener. Comp. System, 86,