What is a good pattern to sync data and schema ?

So, to answer your question, I think I will segrigate it into two different part.

  1. For OLTP systems
  2. For OLAP systems

For OLTP systems:

This can be pretty straight forward as you sync your front end code.

This can be pretty straight forward As you sync your front end code. You can use any popular repository that say you can use GitHub. You can create your branches as per your need. and keep pushing your code of changes accordingly from your Dev to Production.

But in case you have a change or you have some alter command for your production database then there is something you need to keep in mind. If it is just not an alter command of adding columns and you also need to change an existing column or alter an existing column, there is a scope of data migration. you need to plan accordingly. you need to make sure, in your development schema your migration script is ready. so once you are done with the alter you can go live as soon as possible.

For OLAP systems :

For OLAP systems things are a bit different. There are couple of things you need to keep in your mind before you start to process with the changes. Let’s say you have a schema level change for a dimension table. in that case you don’t have much worry as the volume of data is significantly low then the facts.

For these tables you can use the same methodology of having the schema and the migration script ready before you start that changes. But, you need to remember one thing : lot of common base data bases now a days assign the primary ID depending on their particular algorithm. Example : Amazon redshift is a popular data warehouse system. If you assign your primary key column waste on the auto increment ID populate by redshift it will be always different as redshift internally maintain those keys. So, in this case do you are capable to maintain your schema but you will be in a trouble to maintain your data.

So, I would say for OLAP system it should be the other way around. You take a backup of your production specifically for the dimension tables as the data warehouse IDs are most important for you and then you put those changes in your development and test test and test and then put the final script into production.

This is from one of my quora post I think about I should post and keep in my blog. Below is the link for actual Quora post.
https://www.quora.com/What-is-a-good-pattern-to-sync-data-and-schema-for-your-dev-stage-prod-databases-We-need-to-start-in-dev-and-move-the-schema-upwards-to-stage-and-then-prod

Leave a Comment

Your email address will not be published. Required fields are marked *