As per Murphy’s law “Whatever can go wrong, will go wrong” . Hence after chasing 100% availability target, our next goal should always be to minimize MTTD and MTTR. Active/active replication systems have been in talks for the last several years to keep MTTD ( mean time to detection ) and MTTR ( Mean Time to Repair ) in control. With active — active database architecture the MTTD and MTTR becomes minimal to negligible and cater critical business/application availability requirements.
Although active-active adds to architecture complexity but gives a mammoth amount of gain during outages. The blog audience is for database/application architects/managers who are looking for active-active database infrastructure on Aurora Postgres. Currently Amazon Aurora Global Database with the latest version of Aurora Postgres do provide disaster recovery feature to set up read replicas across regions but as of now you can’t swap the role in between primary and read replica during maintenance . Also , even with swap of the role functionality the application can’t run bi-directional active-active across region touching databases on each region and downtime would be the time required to swap the role thereby impacting uptime .
What are we trying to Solve ?
Do you have a requirement to run your database in active — active for disaster recovery with no downtime , running your application active-active across region or to perform migration across different versions across AWS region or within region ? Several options are available to suffice above needs ranging from using Oracle Goldengate , Shareplex , AWS DMS ( Data Migration Service ) and many others tools . The blog focuses on setting up active-active database replication configuration leveraging AWS DMS on Aurora Postgres covering implementation steps , current bottlenecks , limitations and conclusions based on the executed tests.
What is Amazon Aurora Postgres ?
Amazon Aurora is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The PostgreSQL-compatible edition of Aurora delivers up to 3X the throughput of standard PostgreSQL running on the same hardware, enabling existing PostgreSQL applications and tools to run without requiring modification. The combination of PostgreSQL compatibility with Aurora enterprise database capabilities provides an ideal target for commercial database migrations.
Managed AWS DMS Service was leveraged for bi-directional replication set up with an assumption that a DMS replication instance is in place and able to communicate with both source and target DB . AWS DMS bidirectional replication isn’t intended as a full multi-master solution including a primary node, conflict resolution, and so on. Use bidirectional replication for situations where data on different nodes is operationally segregated. In other words, suppose that you have a data element changed by an application operating on node A, and that node A performs bidirectional replication with node B. That data element on node A is never changed by any application operating on node B.
Architecture & Setup
Below are detailed implementation set up steps on the database and DMS . Some of the database changes may need a reboot. The setup was performed on two databases on Aurora Postgres 11.6 with a replication instance on 3.3.3 version
AWS DMS Limitations on Aurora Postgres
- Captured Tables must have primary key
- DMS bidirectional replication does not include conflict resolution however data validation may be utilized to detect data inconsistencies
- DMS doesn’t support change processing of truncate operations as of Aug 2020 . Checkout limitations at AWS documentation
- Batch Apply is not supported for tasks utilizing the loopback prevention. Batch apply must be false.
- If you create the table with the same name post executing drop then DDL of the table won’t be replicated. This is currently a bug and AWS is working on the issue as of Aug 2020
- Bi-directional set up is DDL sensitive hence ensure DDL replication is set in One way
Although DMS for Aurora Postgres for bidirectional needs to be matured for consumption but If you have need of the hour then you can set up active-active while keeping in mind the mentioned limitation when you are deploying DDL changes through your release cycle or performing ad hoc maintenance on the database . The above setup does increase uptime and gives flexibility to run traffic during planned maintenance or during migration where you want to control a certain percentage of traffic running on a specific database version before you want to cut over 100% to a newer version of the database.
Author : Rajesh Saluja
Contributor & Reviewer : Tushar Thakker
The blog content is a team effort and I would like to thank Tushar Thakker on the feedback and recommendations.