This article applies to:
Product edition: Phoenix
Problem description
The Cloud Cache decommissioning process gets stuck.
Causes and Resolution
This article describes the different scenarios that can cause this issue. The stages of decommissioning are as follows:
Backup sets are unmapped from CloudCache.
Backups and restores to and from the Phoenix CloudCache stop.
Phoenix CloudCache waits for the next scheduled synchronization operation to flush the unsync data from CloudCache to Phoenix Cloud.
Phoenix removes the data blocks from Cache Store.
Phoenix removes the Cache Entries from Phoenix UI and database.
Scenario-1: When the Cloud Cache is disconnected
Ensure the Phoenix Cloud Cache status is connected to the Phoenix Cloud until the entire decommissioning process and removal from the UI are complete. Do not disconnect the CloudCache when the decommissioning is in progress.
Traceback
[2019-03-11 14:47:52,650] [ERROR] [wpid 91-4228-1548748451] Failed to connect, error:Failed to connect to client. (#100000011) [2019-03-11 14:47:52,650] [INFO] [wpid 91-4228-1548748451] CacheFlush activity disconnected. wid 0 [2019-03-11 14:47:52,650] [INFO] Cache CacheFlush activity disconnected. Bytes read 0.00 B. Bytes written 0.00 B. [2019-03-11 14:47:55,650] [ERROR] [wpid 226-940-1481039643] Error <class 'inSyncLib.inSyncError.SyncError'>:Failed to connect to client. (#100000011). Traceback -Traceback (most recent call last): File "roboCacheWorker.py", line 284, in runserver File "inSyncLib\inSyncRPCServer.pyc", line 557, in serve SyncError: Failed to connect to client. (#100000011) Snip
Resolution
Ensure that the Phoenix Cloud is connected to the Phoenix CloudCache. Fix the connectivity issues if any to resume the decommissioning.
The Cache status remains stuck on Decommissioning in Progress if the CloudCache is disconnected.
See also
Scenario-2: Network bandwidth allocated and sync schedule is insufficient
According to the decommissioning workflow, Phoenix synchs the data pending for synchronization to Phoenix Cloud in the next cache schedule cycle. Since this takes time, CloudCache displays Decommissioning in Progress depending on:
The size of the data to be synchronized
The available network bandwidth
The duration specified in the sync schedule
💡 Tip
Configure the CloudCache synchronization schedule for 24 hours for 7 days to un-interrupted decommission. Ensure that you select the Max Available Bandwidth in your environment.(The bandwidth is measured in Megabits/second)
Scenario-3: Phoenix CacheStore is unavailable
The decommission can get stuck if the customer has initiated a decommission process and all the data has been synced to the cloud; however, the PhoenixCacheStore is unavailable. This can occur when:
The volume on which the CacheStore resides has been formatted.
The disk on which the CacheStore resides has crashed.
Traceback
[2019-03-11 15:47:56,732] [ERROR] Error <type 'exceptions.Exception'>:CRITICAL: Cache store folder E:\PhoenixCacheStore does not exist on file system. Exiting. Traceback -Traceback (most recent call last):
File "roboCacheServer.py", line 236, in server_main
File "roboCacheServer.py", line 389, in _server_main
Exception: CRITICAL: Cache store folder E:\PhoenixCacheStore does not exist on file system. Exiting.
Resolution
Do not format the volume where the Cache Store is residing until the decommissioning process is complete. Contact Druva Support to further troubleshoot this scenario.
Scenario-4: Storage Mapped does not exist
Cloud Cache decommissioning can get stuck when the process is initiated but the storage mapped to the Cloud Cache is deleted. In this case, unflushed data accumulates in the DB that is never synced.
Resolution
Contact Druva Support to troubleshoot this scenario.
References
Scenario-5 : Phoenix cloud-cache is deleted from the customer environment.
Cloud-cache Decommissioning can become blocked if the cloud cache has been completely deleted from the customer environment before the decommissioning process is initiated from the console.
Note : The above step is not an ideal way of doing the decommission process. If at any point there is data which is unsynced might cause the data loss situation.
Resolution :
Firstly make sure that the decommission process is initiated once there is no pending data to upload in the cloud-cache console.
Once that is done you can then wait for the decommission to complete and then delete the cloud-cache from the environment
If any issues contact support for further investigation.