Problem description
Server backups stuck in queue state
Cause
Stuck or Stale Phoenix process
File server parameters in configuration files(Phoenix.cfg/ Config.yaml) has missing values.
Traceback
For Scenario where parameters in configuration files(Phoenix.cfg/ Config.yaml) has missing values, in the main_service.log/controlservice.log
file we see below information:
[2023-10-01 00:29:29,002] [ERROR] Error <type 'exceptions.NameError'>:global name 'SyncLog' is not defined. Traceback -Traceback (most recent call last): File "client\service_util.pyc", line 101, in __create_config_for_service File "roboLib\roboConfigParser.pyc", line 1275, in create NameError: global name 'SyncLog' is not defined [2023-10-01 00:29:29,002] [ERROR] service_util: Could not create configuration file for SQL [2023-10-01 00:29:29,509] [ERROR] Failed to run command with error : [2023-10-01 00:29:29,509] [INFO] No open port found in Phoenix process:
Resolution
Scenario 1:
Windows Server
Select the queued job and cancel it from Druva Management Console.
Login to the affected server for which jobs remain in queued state.
Open Services console(services.msc) and stop Druva agent client service.
Agent Version 6.x.x:Hybrid Workloads Agent Client Service.
Agent Version 7.x.x:Druva-EnterpriseWorkloads Service
Open Task Manager and kill all Druva or Phoenix related processes.
Open Services console(
services.msc)
and start Druva agent client service.
Agent Version 6.x.x:Hybrid Workloads Agent Client Service.
Agent Version 7.x.x:Druva-EnterpriseWorkloads Service
Trigger the manual backup for the affected server using Backup Now option from Druva Management Console.
Linux Server
Select the queued job and cancel it from Druva Management Console.
Login to the affected server for which jobs remain in queued state.
Stop the Druva agent client service using below command:
Agent Version 6.x.x:/etc/init.d/Phoenix stop
Agent Version 7.x.x:service Druva-EnterpriseWorkloads stop
Verify if any Phoenix/Druva processes are still running.
Agent Version 6.x.x:ps -ef | grep -i Phoenix
Agent Version 7.x.x:ps -ef | grep -i Druva
If any Phoenix/Druva processes are still running, terminate them using the following command
Kill -9 <process id>
Start the Druva agent client service using below command:
Agent Version 6.x.x:/etc/init.d/Phoenix start
Agent Version 7.x.x:service Druva-EnterpriseWorkloads start
Trigger the manual backup for the affected server using Backup Now option from Druva Management Console.
Scenario 2: Antivirus Exclusion
Check the antivirus or whitelisting for any Druva processes that may have been quarantined/blocked as mentioned in the article. Release/Whitelist these processes from quarantine/firewall.
Next, run the Agent installation file to reinstall the Agent. Since the process was quarantined, it might be corrupted.
After reinstalling the Agent, re-register the server on the console.
Finally, trigger a backup and verify the results.
Scenario 3: Missing values in the configuration files
If the server connection status in Druva Management Console is disconnected, then follow the below steps:
There might be missing values in the configuration parameters, re-register the server on the console.
Verify the Agent status on the Druva Console to ensure it is connected.
Then, trigger a backup and check the results.
Scenario 4: If the File Server is having degraded performance
Check the server's network performance:
Test Data Transfer:
Attempt to copy data from another server to the server in question.
Observe Network Performance:
Monitor how efficiently the server handles the network bandwidth during the data transfer.
Watch for Connection Issues:
Be aware that the server may disconnect the remote session if it struggles to handle the data transfer.
Check if there is sufficient RAM available at the time, as jobs may enter a queue state if services cannot access the necessary resources.
Scenario 5: VMWARE backup is in queue state
Backup proxy may be in disconnected state(network issues) or
More than 3 jobs are running at same time on same backup proxy or
The
/var/log
volume becomes 100% full and exhausts the allocated space. (Link) orWhen the ESXi/vCenter credentials are changed/expired but are not updated on the Backup Proxy.