top of page

NoSQL & MapReduce

  • ali@fuzzywireless.com
  • Mar 3, 2022
  • 2 min read

NoSQL databases are based on aggregate-oriented data model which means that for an example of shopping cart, all items are stored and accessible in a single aggregate as a unit (Sadalage & Fowler, 2012). If a certain item within shopping cart is required for analysis than aggregate model of NoSQL will require reading every shopping cart. Some improvement can be achieved with the use of index but still process will remain slow. This kind of a problem is not observed in relational database where data can be accessed in various ways. Furthermore, ‘views’ which is a relation across several tables in relational database, hidden from end-user can present data differently than stored thus offering a nice form of encapsulation however views are still computationally expensive. Materialized views are invented so that these views can be computed ahead of time and cached on disk. NoSQL do not have views but support precomputed and cached queries which are also called materialized view. One way of generating materialized view is to update at every transaction or by running periodic batches. Materialized views can be generated using MapReduce.

MapReduce is broken into two functions namely, Map and Reduce (Sadalage & Fowler, 2012). Map gets the aggregated data as input and outputs key-value pairs, which means map only operates on a single record. Map function can run in parallel across several nodes to improve processing. Reduce gets the output of map function as input and generate the output by combining the same keys and their values, which means reduce works on all values of given key.


Since map and reduce functions are working on a key-value pair which is why NoSQL key-value database can be used to store the output of map function, which are key-value pairs and output of reduce function, which combine the values of given key. The benefit of keeping materialized view of map and reduce function in an aggregated NoSQL database either by updating whenever change happens or periodically via batch approach, is that it will the desired information stored and readily available to use. It is also possible to run the map and reduce functions outside the main database but put back the materialized view for quick retrieval resulting in improved efficiency and reduced response time.


Reference:

Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education.

Recent Posts

See All
Native XML database and DBMS

XML is a popular data format which can be stored, processed and analyzed in traditional relational DBMS as well as new breed of database...

 
 
 

Comments


Post: Blog2_Post
bottom of page