Itzik Gur - 05.08.202220220805

Join our community of 1,000+ IT professionals, and receive tech tips and updates once a week.

Ansible Disaster Recovery Guide AWS

Step-by-Step Guide to Disaster Recovery for the Ansible Automation Platform installed in AWS

Not to sound negative, but organisations should always try to prepare for the worst and hope for the best.

Disaster Recovery (DR) is critical for every organisation. Ensuring business remains uninterrupted is key, whether you need to prepare for unforeseen incidents like a data centre outage or reside somewhere susceptible to natural disasters. But how can you guarantee the changes don’t impact the end user?

There are several ways to provide Disaster Recovery for the Red Hat Ansible Automation Platform installed in AWS (you might have seen my ‘How-to guide on Ansible Tower Backup and Restore on Azure’). AAP provides a built-in backup method which can be executed by using the same installation script with ‘-b’ switch: setup.sh -b. This approach backs up the entire AAP configuration including; the Postgres DB, all the controller, execution, and hub nodes. As a result, we have a backup which can be used to recover the entire environment.

Often this approach might be too difficult to implement in a cloud environment as it requires a prolonged outage for the entire AAP environment and preferably the recovery is done in a freshly provisioned environment. This step-by-step guide describes the steps required to restore Ansible’s Automation Platform DB in AWS when RDS is used without the need for the AAP backup file.

PREREQUISITES:

AAP installed with AWS RDS Database
RDS DB is being protected with AWS Snapshots
RDS DB snapshot is available
Access to relevant sections of AWS console
Access to controller and hub nodes

HOW TO RESTORE DB FOR AAP:

Log into AWS
Navigate to RDS > Snapshots
Select the Snapshot to restore:

New Zealand | Ansible Disaster Recovery Guide AWS

Select Actions –> Restore Snapshot:

In the configuration of Restore snapshot, specify a new DB instance identifier, for example —> aapdb02

Ensure all other settings are the same as for the original instance (especially VPC security groups)
Click Restore DB Instance and wait patiently until it is restored and online
Log into the controller node, preferably as a root
Verify connection to a new DB using psql or podman container. The syntax is like the below, with the exception of the username (-U) and the database we are connecting to (the name can be found in the inventory file)

[root@aapcontroller01 ~]# psql -h aapdb02.cbr8auhjg1vp.us-west-1.rds.amazonaws.com -U awx -d aapdb Password for user awx:  psql (13.7, server 12.10) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help.   aapdb=> \l                                   List of databases    Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges    -----------+----------+----------+-------------+-------------+-----------------------  aapdb     | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres         +            |          |          |             |             | postgres=CTc/postgres+            |          |          |             |             | awx=CTc/postgres  aaphubdb  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres         +            |          |          |             |             | postgres=CTc/postgres+            |          |          |             |             | awx=CTc/postgres  postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |   rdsadmin  | rdsadmin | UTF8     | en_US.UTF-8 | en_US.UTF-8 | rdsadmin=CTc/rdsadmin  template0 | rdsadmin | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/rdsadmin          +            |          |          |             |             | rdsadmin=CTc/rdsadmin  template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +            |          |          |             |             | postgres=CTc/postgres (6 rows) aapdb=>

Once connectivity has been confirmed, stop the services on all servers (controllers and hubs):

Run the following command on controllers:

automation-controller-service stop

Run the following command on the hubs:

systemctl stop pulp* nginx redis

On controller nodes, change directory to: /etc/tower/conf.d/

cd /etc/tower/conf.d

Edit the postgres.py file and update the DB Host name on all controller nodes

# Ansible Automation Platform controller database settings.   DATABASES = {    'default': {        'ATOMIC_REQUESTS': True,        'ENGINE': 'awx.main.db.profiled_pg',        'NAME': 'aapdb',        'USER': 'awx',        'PASSWORD': """Password""",        'HOST': 'aapdb02.cbr8auhjg1vp.us-west-1.rds.amazonaws.com',        'PORT': '5432',        'OPTIONS': { 'sslmode': 'prefer',                     'sslrootcert': '/etc/pki/tls/certs/ca-bundle.crt',        },    } }

On the hub nodes, change directory to /etc/pulp:

cd /etc/pulp

Edit the settings.py file and update the database name in DATABASES section:

DATABASES = {'default': {'HOST': 'aapdb02.cbr8auhjg1vp.us-west-1.rds.amazonaws.com', 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'aaphubdb', 'USER': 'awx', 'PASSWORD': 'reducted', 'PORT': 5432, 'OPTIONS': {'sslmode': 'prefer', 'sslrootcert': '/etc/pki/tls/certs/ca-bundle.crt'}}} REDIS_HOST = 'localhost' REDIS_PORT = 6379 CACHE_ENABLED = True GALAXY_COLLECTION_SIGNING_SERVICE = 'ansible-default' PRIVATE_KEY_PATH = '/etc/pulp/certs/token_private_key.pem' PUBLIC_KEY_PATH = '/etc/pulp/certs/token_public_key.pem' TOKEN_SERVER = 'https://aaphub01.example.net/token' TOKEN_SIGNATURE_ALGORITHM = 'ES256' ALLOWED_CONTENT_CHECKSUMS = ['sha224', 'sha256', 'sha384', 'sha512'] SECRET_KEY = 'reducted' CONTENT_ORIGIN = 'https://aaphub01.example.net' X_PULP_API_PROTO = 'https' X_PULP_API_HOST = 'aaphub01.example.net' X_PULP_API_PORT = '443' X_PULP_API_PREFIX = 'pulp_ansible/galaxy/automation-hub/api' GALAXY_API_DEFAULT_DISTRIBUTION_BASE_PATH = 'published' GALAXY_ENABLE_API_ACCESS_LOG = False GALAXY_ENABLE_UNAUTHENTICATED_COLLECTION_ACCESS = False GALAXY_ENABLE_UNAUTHENTICATED_COLLECTION_DOWNLOAD = False GALAXY_REQUIRE_CONTENT_APPROVAL = True GALAXY_AUTO_SIGN_COLLECTIONS = False REDIS_URL = 'unix:///var/run/redis/redis.sock' ANSIBLE_API_HOSTNAME = 'https://aaphub01.example.net' ANSIBLE_CONTENT_HOSTNAME = 'https://aaphub01.example.net' CONTENT_BIND = 'unix:/var/run/pulpcore-content/pulpcore-content.sock' CONNECTED_ANSIBLE_CONTROLLERS = ['https://aapcontroller01.example.net', 'https://aapcontroller02.example.net'] DEPLOY_ROOT = "/var/lib/pulp" MEDIA_ROOT = "/var/lib/pulp/media" STATIC_ROOT = "/var/lib/pulp/assets" WORKING_DIRECTORY = "/var/lib/pulp/tmp" FILE_UPLOAD_TEMP_DIR = "/var/lib/pulp/tmp" DB_ENCRYPTION_KEY = "/etc/pulp/certs/database_fields.symmetric.key"

Reboot the hub nodes
Start the service on all controller nodes:

automation-controller-service start

On the controller node run the following command as root. The command will verify the connection to the DB and connection between the controller nodes. All nodes should be visible and have the recent heartbeat timestamp:

[root@aapcontroller01 ~]# awx-manage list_instances [controlplane capacity=270 policy=100%] aapcontroller01.example.net capacity=135 node_type=hybrid version=4.2.0 heartbeat="2022-06-23 05:44:45" aapcontroller02.example.net capacity=135 node_type=hybrid version=4.2.0 heartbeat="2022-06-23 05:44:44"   [default capacity=270 policy=100%] aapcontroller01.example.net capacity=135 node_type=hybrid version=4.2.0 heartbeat="2022-06-23 05:44:45" aapcontroller02.example.net capacity=135 node_type=hybrid version=4.2.0 heartbeat="2022-06-23 05:44:44"

Connect to each controller node using web browser and verify the communication and configuration
Connect to each hub node using web browser and verify the communication and configuration

Don’t let an outage catch you off guard! Reach out to Insentra if you would like to explore a joint Disaster Recovery solution tailored to your needs.