Step-by-Step Guide to Disaster Recovery for the Ansible Automation Platform installed in AWS
Not to sound negative, but organisations should always try to prepare for the worst and hope for the best.
Disaster Recovery (DR) is critical for every organisation. Ensuring business remains uninterrupted is key, whether you need to prepare for unforeseen incidents like a data centre outage or reside somewhere susceptible to natural disasters. But how can you guarantee the changes don’t impact the end user?
There are several ways to provide Disaster Recovery for the Red Hat Ansible Automation Platform installed in AWS (you might have seen my ‘How-to guide on Ansible Tower Backup and Restore on Azure’). AAP provides a built-in backup method which can be executed by using the same installation script with ‘-b’ switch: setup.sh -b. This approach backs up the entire AAP configuration including; the Postgres DB, all the controller, execution, and hub nodes. As a result, we have a backup which can be used to recover the entire environment.
Often this approach might be too difficult to implement in a cloud environment as it requires a prolonged outage for the entire AAP environment and preferably the recovery is done in a freshly provisioned environment. This step-by-step guide describes the steps required to restore Ansible’s Automation Platform DB in AWS when RDS is used without the need for the AAP backup file.
PREREQUISITES:
- AAP installed with AWS RDS Database
- RDS DB is being protected with AWS Snapshots
- RDS DB snapshot is available
- Access to relevant sections of AWS console
- Access to controller and hub nodes
HOW TO RESTORE DB FOR AAP:
- Log into AWS
- Navigate to RDS > Snapshots
- Select the Snapshot to restore:

- Select Actions –> Restore Snapshot:

- In the configuration of Restore snapshot, specify a new DB instance identifier, for example —> aapdb02

- Ensure all other settings are the same as for the original instance (especially VPC security groups)
- Click Restore DB Instance and wait patiently until it is restored and online
- Log into the controller node, preferably as a root
- Verify connection to a new DB using psql or podman container. The syntax is like the below, with the exception of the username (-U) and the database we are connecting to (the name can be found in the inventory file)
[root@aapcontroller01 ~]# psql -h aapdb02.cbr8auhjg1vp.us-west-1.rds.amazonaws.com -U awx -d aapdb Password for user awx: psql (13.7, server 12.10) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. aapdb=> \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- aapdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres + | | | | | postgres=CTc/postgres+ | | | | | awx=CTc/postgres aaphubdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres + | | | | | postgres=CTc/postgres+ | | | | | awx=CTc/postgres postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | rdsadmin | rdsadmin | UTF8 | en_US.UTF-8 | en_US.UTF-8 | rdsadmin=CTc/rdsadmin template0 | rdsadmin | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/rdsadmin + | | | | | rdsadmin=CTc/rdsadmin template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres (6 rows) aapdb=>
- Once connectivity has been confirmed, stop the services on all servers (controllers and hubs):
Run the following command on controllers:
automation-controller-service stop
Run the following command on the hubs:
systemctl stop pulp* nginx redis
- On controller nodes, change directory to: /etc/tower/conf.d/
cd /etc/tower/conf.d
- Edit the postgres.py file and update the DB Host name on all controller nodes
# Ansible Automation Platform controller database settings. DATABASES = { 'default': { 'ATOMIC_REQUESTS': True, 'ENGINE': 'awx.main.db.profiled_pg', 'NAME': 'aapdb', 'USER': 'awx', 'PASSWORD': """Password""", 'HOST': 'aapdb02.cbr8auhjg1vp.us-west-1.rds.amazonaws.com', 'PORT': '5432', 'OPTIONS': { 'sslmode': 'prefer', 'sslrootcert': '/etc/pki/tls/certs/ca-bundle.crt', }, } }
- On the hub nodes, change directory to /etc/pulp:
cd /etc/pulp
- Edit the settings.py file and update the database name in DATABASES section:
DATABASES = {'default': {'HOST': 'aapdb02.cbr8auhjg1vp.us-west-1.rds.amazonaws.com', 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'aaphubdb', 'USER': 'awx', 'PASSWORD': 'reducted', 'PORT': 5432, 'OPTIONS': {'sslmode': 'prefer', 'sslrootcert': '/etc/pki/tls/certs/ca-bundle.crt'}}} REDIS_HOST = 'localhost' REDIS_PORT = 6379 CACHE_ENABLED = True GALAXY_COLLECTION_SIGNING_SERVICE = 'ansible-default' PRIVATE_KEY_PATH = '/etc/pulp/certs/token_private_key.pem' PUBLIC_KEY_PATH = '/etc/pulp/certs/token_public_key.pem' TOKEN_SERVER = 'https://aaphub01.example.net/token' TOKEN_SIGNATURE_ALGORITHM = 'ES256' ALLOWED_CONTENT_CHECKSUMS = ['sha224', 'sha256', 'sha384', 'sha512'] SECRET_KEY = 'reducted' CONTENT_ORIGIN = 'https://aaphub01.example.net' X_PULP_API_PROTO = 'https' X_PULP_API_HOST = 'aaphub01.example.net' X_PULP_API_PORT = '443' X_PULP_API_PREFIX = 'pulp_ansible/galaxy/automation-hub/api' GALAXY_API_DEFAULT_DISTRIBUTION_BASE_PATH = 'published' GALAXY_ENABLE_API_ACCESS_LOG = False GALAXY_ENABLE_UNAUTHENTICATED_COLLECTION_ACCESS = False GALAXY_ENABLE_UNAUTHENTICATED_COLLECTION_DOWNLOAD = False GALAXY_REQUIRE_CONTENT_APPROVAL = True GALAXY_AUTO_SIGN_COLLECTIONS = False REDIS_URL = 'unix:///var/run/redis/redis.sock' ANSIBLE_API_HOSTNAME = 'https://aaphub01.example.net' ANSIBLE_CONTENT_HOSTNAME = 'https://aaphub01.example.net' CONTENT_BIND = 'unix:/var/run/pulpcore-content/pulpcore-content.sock' CONNECTED_ANSIBLE_CONTROLLERS = ['https://aapcontroller01.example.net', 'https://aapcontroller02.example.net'] DEPLOY_ROOT = "/var/lib/pulp" MEDIA_ROOT = "/var/lib/pulp/media" STATIC_ROOT = "/var/lib/pulp/assets" WORKING_DIRECTORY = "/var/lib/pulp/tmp" FILE_UPLOAD_TEMP_DIR = "/var/lib/pulp/tmp" DB_ENCRYPTION_KEY = "/etc/pulp/certs/database_fields.symmetric.key"
- Reboot the hub nodes
- Start the service on all controller nodes:
automation-controller-service start
- On the controller node run the following command as root. The command will verify the connection to the DB and connection between the controller nodes. All nodes should be visible and have the recent heartbeat timestamp:
[root@aapcontroller01 ~]# awx-manage list_instances [controlplane capacity=270 policy=100%] aapcontroller01.example.net capacity=135 node_type=hybrid version=4.2.0 heartbeat="2022-06-23 05:44:45" aapcontroller02.example.net capacity=135 node_type=hybrid version=4.2.0 heartbeat="2022-06-23 05:44:44" [default capacity=270 policy=100%] aapcontroller01.example.net capacity=135 node_type=hybrid version=4.2.0 heartbeat="2022-06-23 05:44:45" aapcontroller02.example.net capacity=135 node_type=hybrid version=4.2.0 heartbeat="2022-06-23 05:44:44"
- Connect to each controller node using web browser and verify the communication and configuration
- Connect to each hub node using web browser and verify the communication and configuration
Don’t let an outage catch you off guard! Reach out to Insentra if you would like to explore a joint Disaster Recovery solution tailored to your needs.