Skip to main content

Troubleshooting Read/Write Failures on Druva CloudCache

Troubleshooting Read/Write Failures on Druva CloudCache

Updated over 3 weeks ago

Problem description

The CloudCache service is unable to read or write data to its configured volume (/mnt/data/PhoenixCloudCacheStore). This disruption in data flow occurs during critical operations such as backups, restores, or synchronization schedules. When this happens, Druva triggers automated email alerts to administrators identifying the specific CloudCache affected.

The issue typically manifests in two ways:

  • Write Errors: Occur when CloudCache attempts to save new backup data or staged restore data to the local volume.

  • Read Errors: Occur when CloudCache attempts to retrieve existing data from the volume, most commonly during a synchronization job where data is moved to the Druva Cloud.

Cause

The root cause is generally a disruption in the local storage environment rather than the Druva service itself. Primary causes include:

  • Low Disk Space: The volume is at capacity, preventing the creation of new data blocks.

  • Filesystem Corruption: Errors in the filesystem structure often caused by unclean shutdowns or power failures.

  • Hardware Failure: Underlying physical disk degradation or RAID controller malfunctions.

  • Synchronization Backlog: If sync schedules fail (due to network issues), old data accumulates and exhausts storage.

  • Antivirus (AV) Interference: Security software locking Druva processes or scanning the data directory, preventing read/write access.

Traceback

While specific tracebacks vary by OS, logs in /var/log/PhoenixCloudCache typically show: Error: [Errno 28] No space left on device Error: [Errno 30] Read-only file system I/O error during write operation at offset <hex_address>

Resolution

1. Address Low Disk Space

Check the current utilization of the mount point.

  • Command: df -h

  • Action: Verify the used space for /mnt/data/PhoenixCloudCacheStore.

  • Note: Only increase the disk volume if the Unsynced Data for the CloudCache in the Druva Console shows 0 bytes. If there is unsynced data, focus on resolving synchronization first.

2. Repair Filesystem Corruption

If the filesystem is reporting "Read-only" or I/O errors, a repair may be necessary.

  • Command: sudo fsck -y /dev/sdX (Replace /dev/sdX with the actual device identifier).

  • Warning: Never run fsck on a mounted volume. Unmount it first using umount.

3. Monitor Hardware Health

Check for kernel-level hardware alerts.

  • Command: dmesg | grep -i "error|sd" or sudo smartctl -a /dev/sdX.

  • Action: If dmesg shows "Buffer I/O error" or smartctl reports a failing health status, replace the underlying hardware.

4. Configure Antivirus Exclusions

Ensure the security suite is not interfering with Druva operations.

Include and exclude filters

Ensure that the backup policy associated with the CloudCache does not include temporary directories or massive system files that might cause unexpected spikes in local cache usage, leading to the "Disk Full" scenario.

Verification

  1. Restart Services: After applying fixes, restart the CloudCache service.

  2. Test Job: Manually trigger a small backup or restore job to confirm local R/W capability.

  3. Sync Check: Monitor the "Sync" status in the Druva Console to ensure data is flowing to the Cloud.

  4. Log Review: Check /var/log/PhoenixCloudCache/ for any recurring I/O error messages.

See also

Did this answer your question?