Rails: No downtime data migrations
- Introduction
- Deploy 1: Add columns and tables
- Deploy 2: Make the backwards compatible changes
- Deploy 3: Rename the unused columns and tables
- Done: Delete the
*_delete_me
columns and tables
Introduction
It is 2017 and shockingly, many rails apps require downtime for a successful deploy. I am looking at you, Gitlab.
Data migrations are a huge (and unnecessary!) cause for bringing down a rails app. For a new feature, the code may depend on the database and date within it be configured in a certain way.
Apply this pattern when you:
- add a new column when denormalizing a table
- completely replacing one model (relationships and all) with another model.
- adding a new gem
To accomplish 0 down time, I will walk you through how to do this by example via implementing the ancestry
gem. Their suggested migration guide to move from a parent_id
to their ancestry
column offers some clues how to perform the migration, but I will emphasize key steps missing from the guide
Deploy 1: Add columns and tables
Create a rails migration to add the columns and tables
The first deployment needs to add columns to your production database. Supporting these extra columns will be simple. You may need to double check your queries for naming collisions if other tables you join on have the same column name.
add_column :users, :ancestry, :string
If the new column name exists on a table that was joined with your table, SQL will complain about not knowing which column to use.
Write a rake task to update the new column
With our example of ancestry, we also add the gem since that will not effect our existing code structure. This gives us the benefit of using their data migration method build_ancestry_from_parent_ids!
for building the ancestry tree.
Keep in mind, all of our āactiveā code should still be using the original parent_id
method. Nothing should change about that.
Sync the new column with the old with a Concern
If you are migrating data from one column to another column, you can use ruby overrides and rails concerns to write to both the new column and the old column without impacting your business logic.
module SyncParentId
extend ActiveSupport::Concern
included do
before_save :update_ancestry_id, if: 'parent_obj_id_changed?'
end
def update_ancestry_id
# call the ancestry method to update the `ancestry` column
self.parent = parent_obj
end
end
After this deploy, your code will use parent_id
as it was before, but now we have a 100% up-to-date ancestry
column that is fully populated and ready to go.
Deploy 2: Make the backwards compatible changes
Now it will be safe to migrate off of parent_id
to ancestry
. Our production database has a fully populated ancestry
column that is perfectly in sync with the old parent_id
. You can now safely migrate from parent_id
to ancestry
without any problems.
If you are extremely cautious and want to defend against a bad deployment and a rollback, you can add a similar syncing method to the new code to maintain the parent_id
code as the ancestry
column changes. If you need to revert the changes, all data will still be in sync and the old version of the app will still be able to use parent_id
with no data loss.
Deploy 3: Rename the unused columns and tables
Rename the columns to *_delete_me
or something to prove that your code is no longer using now unused columns and tables. We donāt want to loose any data and if we find that these tables or columns are actually still in use, we can easily rename them back to their previous names until we go back to step 2 and redeploy code that does not use them.
Done: Delete the *_delete_me
columns and tables
Once you are 100% confident that you no longer need the old columns and tables, you can now create rails migrations to permanently remove these from your system. Some companies may actually do these outside of rails migrations, because they do not allow the deploy
user to destructively modify the database to prevent any accidental table drops.