top of page

Data Migration in NoSQL Databases

  • ali@fuzzywireless.com
  • Mar 3, 2022
  • 5 min read

In traditional schema based relational database management systems (RDBMS), the database has to be changed before any changes can be done on the application (Connolly & Begg, 2015). NoSQL database are more flexible in adjusting to changes occurred due to market changes and product innovation (Connolly & Begg, 2015). Although the data is stored in NoSQL database without regard to the schema, the schema has to be defined by application because application is required to parse the data fetched from database (Connolly & Begg, 2015). In case of RDBMS, database throws error when schema change was not taken care of whereas in NoSQL, application will be throwing error if code not adjusted per schema change (Connolly & Begg, 2015).


Schema Change in NoSQL Databases

Sadalage (2014) outlined two options to perform data migration in NoSQL, migrate all data in one go or migrate data during read (incremental migration). Connolly & Begg (2015) highlighted that in the event of NoSQL schema change, all existing data need to change completely to new schema which can be time or processing or cost prohibitive thus referred as migrate all data in one go. Other option is to perform modification in application, which perform incremental migration of accessed data by transforming into new schema and storing back to database (Connolly & Begg, 2015). Incremental redundancy will migrate data over long period of time but some data might never migrate because it was not accessed in the first place (Connolly & Begg, 2015).

Saur, Dumitras & Hicks (2016) presented their framework performing incremental changes whenever interaction with data happens without any downtime. The solution presented logical view to applications that data is available at the newest format and does not require any application code changes instead application provide the version needed (Saur, Dumitras & Hicks, 2016). Version identifier is added which tracks the key-value pair already updated per new schema (Saur, Dumitras & Hicks, 2016).


Data Migration Example 1

Alam (2010) presented the data migration of customer order system from relational data to NoSQL. Relational database was MySQL whereas NoSQL was MongoDB (Alam, 2010). The schema of relational database is shown in Figure 1 (Alam, 2010).



Figure 1. MySQL Relational database of customer order system. Taken from: Alam, F. (2010). Data migration: Relational RDBMS to non-relational NoSQL. Retrieved from http://digital.library.ryerson.ca/islandora/object/RULA%3A4149. Copyright 2010


The customer order system is based on one-to-many relationship where a customer can have multiple orders while each order can also have multiple products as shown in Figure 2 (Alam, 2010).



Figure 2. MySQL Relational database of customer order system – one to many relationshop. Taken from: Alam, F. (2010). Data migration: Relational RDBMS to non-relational NoSQL. Retrieved from http://digital.library.ryerson.ca/islandora/object/RULA%3A4149. Copyright 2010


From Figure 2, it is clear that customerID is common between customer and order tables thus can be joined (Alam, 2010). Similarly, order_id is common between order and order details tables, which can be joined. Other joins that can be performed between tables are order details and product tables using productID and product and supplier tables using supplierID. The joins are shown in Figure 3.



Figure 3. MySQL Relational database of customer order system – joins between tables. Taken from: Alam, F. (2010). Data migration: Relational RDBMS to non-relational NoSQL. Retrieved from http://digital.library.ryerson.ca/islandora/object/RULA%3A4149. Copyright 2010


MongoDB stores data in the form of document collection, which is why for customer order system every order is stored as a single document which consists of other nested objects including customer, products and suppliers (Alam, 2010). Figure 4 shows the possible schema for MongoDB.


Figure 4. MongoDB NoSQL database of customer order system. Taken from: Alam, F. (2010). Data migration: Relational RDBMS to non-relational NoSQL. Retrieved from http://digital.library.ryerson.ca/islandora/object/RULA%3A4149. Copyright 2010


The schema of MongoDB in Figure 4 represent every order as an aggregation with other objects like customers, products and suppliers (Alam, 2010).


Data Migration Example 2

Slater (2015) shared some best practices for migrating data from relational RDBMS to Amazon’s NoSQL DynamoDB . For instance, an RDBMS table relating media files with the codec used for transcoding is shown in Figure 5 (Slater, 2015).



Figure 5. Relational database of media files with codec. Taken from: Slater, N. (2015). Best practices for migrating from RDBMS to Amazon DynamoDB. Retrieved from https://d0.awsstatic.com/whitepapers/migration-best-practices-rdbms-to-dynamodb.pdf. Copyright 2015


Amazon’s NoSQL DynamoDB organizes data into tables consisting of items which are uniquely identified by primary key (Slater, 2015). The unique key consists of hash key and optionally a range key, Figure 6 shows the hash and range keys in DynamoDB (Slater, 2015).



Figure 6. Amazon’s DynamoDB – Hash & Range Keys. Taken from: Slater, N. (2015). Best practices for migrating from RDBMS to Amazon DynamoDB. Retrieved from https://d0.awsstatic.com/whitepapers/migration-best-practices-rdbms-to-dynamodb.pdf. Copyright 2015


If item can be identified uniquely by hash key than there is no need of range key however in some cases, composite of hash and range key is used to uniquely identify an item (Slater, 2015). Since DynamoDB does not support the concept of table join which is why de-normalization of the data is often required (Slater, 2015). DynamoDB table of RDBMS example previously shown in Figure 5 is presented in Figure 7 after de-normalization using hash and range key (Slater, 2015).



Figure 7. Amazon’s DynamoDB NoSQL database of media files with codec. Taken from: Slater, N. (2015). Best practices for migrating from RDBMS to Amazon DynamoDB. Retrieved from https://d0.awsstatic.com/whitepapers/migration-best-practices-rdbms-to-dynamodb.pdf. Copyright 2015


Performance Evaluation after Data Migration

Alam (2010) highlighted some of the important performance metrics to evaluate the migration process from relational database to NoSQL database. Some of the important metrics are:

1. Data storage related performance – comparison of time taken to perform tasks like, insert, update and delete in RDBMS versus NoSQL database;

2. Data loading relation performance – comparison of time taken to load number of records with and without ‘order by’ and ‘where’ clauses in RDBMS versus NoSQL database and

3. Data loading performance for data aggregation – comparison of time taken to generate summary for statistical analysis in RDBMS versus NoSQL database (Alam, 2010).

Key Take-away

Although usage of NoSQL databases is on the rise however there are some non-relational use-cases where they tend to perform better than traditional RDBMS like, mobile applications requiring storing of application and session data, gaming applications which store user preference and player’s game, consumer voting application like reality contests, web content like user personalization, application monitoring using log and event data, internet of things using sensor and log data etc. (Slater, 2015).

On the contrary, there are also workloads which are not suitable for NoSQL databases like transactional systems with well-defined relationships, ad-hoc queries and storage of binary large objects (Slater, 2015).



Reference

Connolly, T. & Begg, C. (2015). Database Systems: a practical approach to design, implementation, and management (6th ed.). Upper Saddle River, NJ: Pearson.


Slater, N. (2015). Best practices for migrating from RDBMS to Amazon DynamoDB. Retrieved from https://d0.awsstatic.com/whitepapers/migration-best-practices-rdbms-to-dynamodb.pdf


Alam, F. (2010). Data migration: Relational RDBMS to non-relational NoSQL. Retrieved from http://digital.library.ryerson.ca/islandora/object/RULA%3A4149


Saur, K., Dumtras, T. & Hicks, M. (2016). Evolving NoSQL databases without downtime. 2016 IEEE International Conference on software maintenance and evolution


Sadalage, P. (2014). Migrations in NoSQL database. Retrieved from http://sadalage.com/blog/2014/10/14/migrations-in-nosql-databases/


Recent Posts

See All
Native XML database and DBMS

XML is a popular data format which can be stored, processed and analyzed in traditional relational DBMS as well as new breed of database...

 
 
 
Database

Database is an important entity for any organization to store, manage and analyze data efficiently using desired software application...

 
 
 

Commentaires


Post: Blog2_Post
bottom of page