Skip to main content
SQS is unreachable phase-1 has failed
Updated over 9 months ago

This article applies to:

  • Product edition: Phoenix

Problem description

DR failover fails with DR4098 error code, with no SQS message in phase-1.


💡 Tip

SQS message is found in Phoenix-SQS-xxx.log


Cause

Phoenix Failover settings like subnet, VPC, or IP address is incorrect.

Traceback

[2019-04-29 17:38:08,515] [ERROR] Max attempts reached while waiting for SQS events, conversion_id = 281_2029_553602. Exiting
[2019-04-29 17:38:08,516] [INFO] Updated orch_status_info = {'dr_failover_state': 3,
 'sqs_queue_name': 'phoenix_281_2029',
 'sqs_url': ' https://queue.amazonaws.com/48604719...oenix_281_2029 ',
 'status': 'started',
 'steps': [{553602: {'ami_progress': 0,
                     'error_code': 4295692290,
                     'error_msg': 'AWS SQS is unreachable from failover instance',
                     'instance_id': 'i-0b21073a51f69b5dc',
                     'phase1_progress': 100,
                     'phase2_progress': 0,
                     'post_phase2_progress': 0,
                     'pre_phase1_progress': 100,
                     'private_ip': '172.24.21.56',
                     'restore_job_id': 1117,
                     'rm_conversion_id': '281_2029_553602',
                     'rp_name': u'Sun Mar 24 06:39:52 2019',
                     'status': 'failed',
                     'version': 'ebs',
                     'vm_failover_state': 99,
                     'vm_name': u'XYZ',
                     'volumes_info': [{'device': '/dev/sda1',
                                       'volume_id': 'vol-0c17a30495a0dace1'},
                                      {'device': '/dev/sdf',
                                       'volume_id': 'vol-028e164ca85b6a115'},
                                      {'device': '/dev/sdg',
                                       'volume_id': 'vol-0e480d86ad9129909'}]}}]}

Resolution

If VPC Endpoint for SQS is not configured:

  1. Note the subnet, security-group, public-ip settings chosen for the failover instance.

  2. In the customer’s AWS account, go to the VPC service.

  3. Under the Subnets section, enter the subnet-id.

  4. Under the Route Table tab, check the Target value corresponding to the Destination value 0.0.0.0.

    SubnetView.png



    If the target is igw-xxxx, then the subnet is a public subnet. For public subnets, the Public-IP settings must be Auto-Assign or <an_elastic_ip>.
    If all the above findings are correct, then the issue might be related to RM conversion.

If VPC endpoint for SQS is configured:

  1. Note the subnet, security-group settings chosen for the failover instance.

  2. Check if the chosen subnet is present in the subnets chosen for SQS Endpoint.

Did this answer your question?