Problem Description
VMware backup jobs suddenly become stuck in a Queued status, causing subsequent scheduled jobs to be skipped or to exceed their allocated backup window.
This issue can occur even when the backup proxies appear to be completely healthy on the surface. For example, the proxies show as Connected in the management console, possess valid credentials, and have ample disk space (e.g.,
/var/logat 99% free).
Cause
The issue is caused by a deep proxy-side agent hang on the Ubuntu-based backup proxy. In this state, the underlying Druva processes become unresponsive to routine task assignments. A standard restart of the Druva service is insufficient because orphaned or leftover processes remain active in the background, keeping the proxy in a locked state.
Traceback
While specific log lines may not always be captured prior to a reboot, typical indicators in the backup logs include:
[INFO] Proxy connection status: CONNECTED
[WARN] Job timeout reached. Target backup window exceeded.
[DEBUG] Worker thread blocked or unresponsive. Failing to pick up new tasks from the queue.
[ERROR] Unable to initialize new backup session: Proxy agent not responding to dispatch commands.
Resolution
If a proxy agent hang occurs, follow these sequential troubleshooting steps to safely clear the stuck state:
Step 1: Stop the Druva Service
Gracefully stop the Druva-EnterpriseWorkloads service on the affected Ubuntu proxy.
Step 2: Kill Leftover Processes
Identify and terminate any lingering or orphaned Druva processes that failed to close during the service stop.
ps -ef | grep -i druva
kill -9 <PID_of_leftover_processes>
Step 3: Restart the Service
Start the Druva-EnterpriseWorkloads service again to re-initialize the agent cleanly.
Step 4: Perform a Proxy OS Reboot (If Required)
If the service restart and process cleanup do not immediately clear the queue, a full reboot of the Ubuntu proxy operating system is required to completely reset the subsystem and clear the hang.
Step 5: Clear and Retry
Cancel the stuck Queued jobs in the Druva Management Console and trigger a manual backup to validate that the proxy is actively processing data again.
Filters
Include: Applies to Hybrid Workloads (VMware) environments utilizing Ubuntu-based Druva backup proxies.
Exclude: This resolution does not apply to backup failures caused by network disconnection, invalid vCenter/ESXi credentials, or full disk capacity (
/var/logor root partition at 100% utilization).
Verification
To verify that the resolution has successfully addressed the problem:
Check the Druva Management Console to ensure the proxy status is Connected.
Trigger a manual backup job for a test virtual machine.
Monitor the job status to ensure it transitions successfully from Queued to In Progress and ultimately completes without hitting a timeout.
Monitor ongoing job concurrency to ensure subsequent scheduled backups are no longer skipped.
