Big Data Analytics Framework
- ali@fuzzywireless.com
- Mar 4, 2022
- 3 min read
Sowmya and Sravanthi (2017) define big data as the one which is huge and cannot be efficiently processed using traditional data processing approaches. Hadoop, an open-source framework is one the tool widely used for big data amongst several others. Some of the key attributes of big data are volume, value, variety, velocity and veracity (Marr, 2014). Biomedical data is regarded as one of the leading examples of big data due to sheer volume of medical health records, diagnosis, patient monitoring data, etc. (Jatmiko, Arsa, Wisesa, Jati and Ma’sum, 2016). Large volume of data is generated from CT scan, MRI, EEG, ECG etc. along with health records of patients. Finally, genome data is a humongous dataset in the order of hundreds of giga bytes for few human genomes (2016).
Raghupathi and Raghupathi (2014) conceptualized a big data analytics architecture for a health care system broken into four major components, big data sources, big data transformation, big data platforms and tools, and big data analytics applications. For the health care industry, the big data sources can be structured or unstructured or semi-structured in the form of medical imaging, patient health records, bio-medical signal etc. In the big data transformation phase, data is extracted, transformed and loaded in the data warehouse. In the next phase of big data platform and tools, Hadoop is used as an open-source distributed data processing platform (2014). Several tools can be used to process big data in Hadoop platform like:
Hadoop Distributed File System (HDFS) – offers distributed storage of data across several nodes (Hadoop, 2018)
MapReduce – consists of two steps, mapping of data broken into tuples (key/value pair) followed by reduction step to combine data tuples into fewer tuples (IBM, 2018)
Hive – offers SQL support for Hadoop architecture (Raghupathi & Raghupathi, 2014)
Jaql – functional declarative query language to facilitate parallel processing of large data by converting high level queries into low level MapReduce tasks
Zookeeper – centralized system to synchronize cluster of servers and coordinate parallel processing
HBase – non-SQL column-based database management system on top of HDFS
Cassandra – NoSQL distributed database system to handle big data processing
Oozie – open-source system to streamline workflow and coordination of tasks
Mahout – generate applications of distributed and scalable machine learning algorithms to support big data analytics on Hadoop
Finally, data analytics applications in healthcare include reports, queries, OLAP, and data mining with visualization based on aggregation, manipulation and analysis (2014).
MapR (2018) outlines big data analytics architecture for healthcare by offering batch, interactive and stream processing on a Hadoop based MapR platform. Batch processing is offered through MapReduce, Spark, Hive and Pig. Interactive processing is performed using Impala while stream processing is done through Spark Streaming and Storm. All these three services run atop MapR database and file system. Solution promises fraud detection, monitoring of patients, personalized treatment plan, and diagnostic assistance (2018).
Reference:
MapR (2018). Big Data and Apache Hadoop for the healthcare Industry. Retrieved from https://mapr.com/resources/big-data-and-apache-hadoop-healthcare-industry/
Hadoop (2018). HDFS Architecture Guide. Retrieved from https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
IBM (2018). IBM Apache MapReduce. Retrieved from https://www.ibm.com/analytics/hadoop/mapreduce
Marr, B. (2014). Big data: The 5 Vs everyone must know. Retrieved from https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-know
Raghupathi, W. & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. 2014 Health Information Science and Systems, 2(3)
Sowmya, M. & Sravanthi, N. (2017). Big data: an overview of features, tools, techniques and applications. 2017 International Journal of Engineering Science and Computing, Vol. 7(6), 13644 - 13647
Jatmiko, W., Arsa, D., Wisesa, H., Jati, G. & Ma’sum, M. (2016). A review of big data analytics in the biomedical field. 2016 International Workshop on big data and information security.
Comments