Introduction
Suspicious data modification on a resource is called Data Anomalies. A user or malicious software can make such changes. For example, if a resource in your organization is under attack, the malicious software on the resource can start modifying and deleting files present in the resource. A resource is a device or server or Sharepoint site where data is stored.
❗ Important
Data Anomalies displays insights about the data protected only for the following resources:
Endpoints
Microsoft 365 - OneDrive
Microsoft 365 - SharePoint
NAS
File Servers (Windows/Linux)
VMware (Windows and Linux -64-bit)
Data is displayed for up to the last 30 days.
When such a potential threat manipulates the data on a resource, it is suspicious in nature and is unlike how the resource owner works with data on that resource. Since anomalies of this type often indicate issues that require attention, Druva flags any such anomalous behavior in a resource and generates an alert.
Prerequisites for VMware Data Anomalies
If you are using Data Anomalies for virtual machines, ensure that the following prerequisites are met:
VMware tools are installed and enabled on the virtual machine. For more information, see Install and Upgrade VMware tools.
Keep the Guest OS credentials handy as you need to provide these details
For Windows virtual machines: The user credentials provided must have administrator privileges or access rights
For Linux virtual machines: The user credentials provided must have either root privileges or sudo user access rights. For more information about configuring and managing sudo user credentials, see Manage credentials for VMware servers.
The Data Anomalies algorithm requires a minimum 7.0.0::r438902 proxy version to detect anomalous file actions such as bulk file creation, deletion, and modification.
The Data Anomalies algorithm requires a minimum 7.0.2::r518961 proxy version to detect anomalous encryption file actions. Contact support for assistance related to anomalous encryption file actions and alerts.
For Windows virtual machines: Enable USN journal for each drive with enough storage.
The default Windows USN journal size for most Windows versions is 32 MB which is insufficient for Data Anomalies on large virtual machines. Druva recommends the following USN journal sizes for different disk sizes:
File Count | Disk Size | Maximum Size |
Files > 10 million | 500 GB | 2 GB |
Files > 5 million | 200 GB | 1 GB |
Files > 2 million | 50 GB | 512 MB |
Files > 1 million | 10 GB | 256 MB |
To increase the USN journal manually, see Microsoft 365 documentation.
For Linux virtual machines: The iNotify watches maximum limit value must be more than the number of directories on the virtual machine
For Linux virtual machines: Any one of these file system types should be present on the virtual machine - ‘xfs’ , ‘ext4’, ‘ext3’
For more information about the software requirements for VMware, see the Support matrix for VMware.
Ensure that the following URLs are whitelisted and allowed for a successful VMware Data Anomalies scan:
*s3.amazonaws.com/*
s3-*.amazonaws.com
s3*.*.amazonaws.com
For more information, see,
Support matrix for VMware Data Anomalies
The following are the supported windows versions for VMware Data Anomalies:
Windows 10 (32 and 64-bit)
Windows Server 2012 (64-bit)
Windows Server 2016 (64-bit)
Windows Server 2019 (64-bit)
Windows Server 2022 (64-bit)
The following are the supported Linux (64-bit) versions for VMware Data Anomalies:
Red Hat Enterprise Linux (RHEL) 7.0 , 7.1, 7.2, 7.3, 7.4, 7.5
CentOS 7.0 , 7.1, 7.2, 7.3, 7.4, 7.5
Ubuntu 16.04, 18.04
Things to Consider
Following are a few limitations that you should know before using Data Anomalies for VMware:
Error: Data Anomalies scan fails with the following error: Invalid pid for Guest VM execution.
Description: This error is observed in the following scenarios:
The
glibc
library version of the guest virtual machine is lower than 2.14The default SELinux restriction enforced by Red Hat. This is specifically observed for the SELinux policy version- selinux-policy-3.13.1-268.el7_9.2.noarch
Workaround: To resolve this issue, do the following:
Upgrade the
glibc
library version of the guest virtual machine to 2.14 or aboveTo bypass the SELinux restriction enforced by Red Hat, perform the following steps:
Run the following command to check the SELinux status :
# sestatus
Set SELinux policy to permissive using
# setenforce 0
commandTo persist enforcement policy, update selinux config file using
# sudo vi /etc/sysconfig/selinux
command.
For more information, see Red Hat documentation.
Data Anomalies for VMware - Linux: A Modified alert displays an event count in case of a change made only to file permissions without any modification in the file contents. You can safely ignore those events.
Data Anomalies for VMware -Windows: When you delete files, Data Anomalies scan cannot find file metadata from USN Journal or Windows with the given file ID. Data Anomalies scan displays the timestamp for such files as the Data Anomalies scan launch timestamp.
How does Druva detect Data Anomalies
Druva’s automated intelligence analyzes and monitors the data activity trend for a given resource, and after a sufficient sample size, it builds the anomaly baseline. An alert is automatically generated and reported in case of any anomalous activity.
What do we mean by baseline?
In the Data Anomalies feature context, a baseline refers to the expected pattern of data behavior over a specific period. It serves as a reference point or benchmark against which you can detect deviations or anomalies.
Step 1 Learning period: In this step, Druva performs a data backup pattern analysis. See Data backup pattern analysis period.
Step 2 Data Anomalies detection process: In this step, Druva checks the backed-up files to detect anomalous file actions such as creation, update, deletion, and encryption.
❗ Important
For VMware resources, backup and the Data Anomalies detection process run simultaneously.
Step 3: Generate and send a Data Anomalies alert: If any data anomalous activity is detected, a Data Anomalies alert is sent.
Following are the algorithm input parameters that Druva requires and uses to analyze the data activity trend and generate alerts in case of any suspicious data activity:
Data backup pattern analysis period for resources - Endpoints, File Server, NAS, VMware, Microsoft 365 (OneDrive and SharePoint): Displayed in Days or Snapshots
Number of files in a snapshot: A minimum number of files required within a snapshot to initiate Data Anomalies learning and scanning.
💡 Tip
If the total number of files in a snapshot is less than the minimum number of files, then that snapshot is not scanned for Data Anomalies detection.
Deviation in the files from the baseline and total files in a snapshot: Percentage deviation threshold compared to the baseline and total files in a snapshot required to qualify as anomalous data.
Recommended Data Anomalies settings
Following are the recommended Data Anomalies settings for analysis period to start encryption checks for a resource and generating Data Anomalies alerts.
❗ Important
We recommend that you keep the default - Recommended Data Anomalies Settings if you are not sure about the data backup pattern of your organization.
Data backup pattern analysis period for resources
30 days (For Endpoints and OneDrive): The default and recommended setting for Endpoints and OneDrive data backup pattern analysis. The Data Anomalies detection for Endpoints and OneDrive will start only if data has been successfully backed for the past 30 days. The permissible settings for days or snapshots are between 2 and 45.
30 days (For File Server/NAS/VMware/SharePoint): The default and recommended setting for File Server/NAS/VMware/SharePoint data backup pattern analysis. The Data Anomalies detection for File Server/NAS/VMware/SharePoint will start only if data has been successfully backed up for the past 30 days. The permissible settings for days or snapshots are between 2 and 45.
100 or more files in a snapshot: The default and recommended setting for the minimum required files in a snapshot of a resource to initiate Data Anomalies detection for resources - Endpoints, OneDrive, File Server, NAS, VMware, and SharePoint. The permissible setting for a minimum count of files is between 20 and 500.
75% of baseline in the snapshot: The default and recommended maximum setting for the file actions (Create, Update, and Delete) in a snapshot for a resource to generate a Data Anomalies alert. Data Anomalies alert is generated if the deviation is observed beyond the set baseline value. The permissible setting for baseline is between 50 and 99%.
% of the total files in a snapshot: The default and recommended setting for the minimum change in the count of files out of the total files in a snapshot to generate Data Anomalies alert.
Endpoints, File Server, NAS, and OneDrive: 70% of the total files in a snapshot
VMware and SharePoint: 20% of the total files in a snapshot
The permissible setting for a minimum change in the count of files is between 5 and 90% for Endpoints, File Server, NAS, and OneDrive.
The permissible setting for a minimum change in the count of files is between 5 and 90% for VMware and SharePoint.
Both the 4th and 5th conditions should be met for Data Anomalies alert to get generated.
You can use the Data Anomalies Settings > Edit option to customize and update the Data Anomalies configuration settings as per your organizational requirements and if you are aware of the data backup patterns.
❗ Important
If you have selected snapshots as your data backup pattern learning period criteria, ensure that the learning duration is completed within 45 days.
The following table explains the Data Anomalies behavior for Endpoints, OneDrive, and File Server/NAS/VMware/SharePoint resources:
❗ Important
First backup is not considered for Data Anomalies detection.
Example
Scenario: Data Anomalies is enabled for a resource with the following Data Anomalies settings with total 500 files.
Backup Pattern learning period | Minimum number of files required in a snapshot for Data Anomalies detection | Maximum Deviation | Minimum percent of total file change |
05 snapshots | 125 | 50% | 20% |
The following example explains the Data Anomalies behavior using the Data Anomalies settings mentioned in the table above.
For the first backup, there were 500 files backed up. Being the first backup, this will be excluded by the Data Anomalies algorithm.
Let's consider subsequent backups in the following trend:
Snapshot# | Created | Modified | Deleted |
2 | 20 | 5 | 8 |
3 | 12 | 7 | 1 |
4 | 0 | 0 | 10 |
5 | 0 | 0 | 0 |
6 | 5 | 0 | 8 |
We have a total of 520 files after the 6th backup. Learning duration is complete - 05 Snapshots. Data Anomalies detection starts and alerts can be generated in case of anomaly.
Now, the baseline is as follows:
Baseline for creation = maximum of new files created in the last learning duration of snapshots. i.e. Maximum of 20, 12, 0, 0, 5 which is 20
Baseline for modification/update= maximum of modified/updated files in the last learning duration of snapshots. i.e. Maximum of 5, 7, 0, 0, 0 which is 7
Baseline for delete=maximum of deleted files in the last learning duration of snapshots. i.e. Maximum of 8, 1, 10, 0, 8 which is 10
The baseline for creation, modification, and deletion is 20, 7, and 10 respectively.
Let's proceed with the next round of backups in the following trend:
Snapshot# | Created
(Baseline for Creation) | Modified
(Baseline for Modification) | Deleted
(Baseline for Deletion) | Total files in last backup |
7 | 10
(20) | 5
(7) | 7
(10) | 520 |
8 | 2
(Max of 12, 0, 0, 5, 10 = 12 ) | 1
(Max of 7, 0, 0, 0, 5 = 7 ) | 2
(Max of 1, 10, 0, 8, 7 = 10 ) | 523 |
9 | 100
(Max of 0,0,5,10,2 = 10 ) | 0
(Max of 0,0,0,5,1 = 5 ) | 8
(Max of 10, 0, 8, 7,2 = 10 ) | 523 |
10 | 80
(Max of 0,5,10,2, 100 = 100 ) | 4
(Max of 0,0,5,1,0 = 5 ) | 10
(Max of 0, 8, 7,2,8 = 8 ) | 615 |
11 | 0
(Max of 5,10,2, 100, 80 = 100 ) | 12
(Max of 0,5,1,0, 4 = 5 ) | 8
(Max of 8,7,2,8, 10 = 10 ) | 685 |
12 | 5
(Max of 10,2, 100, 80, 0 = 100 ) | 50
(Max of 5,1,0, 4, 12 = 12 ) | 70
(Max of 7,2,8, 10, 8 = 10 ) | 677 |
13 | 200
(Max of 2, 100, 80, 0,5 = 100 ) | 0
(Max of 1,0, 4, 12, 50 = 50) | 0
(Max of 2,8, 10, 8, 70 = 70 ) | 612 |
At the 9th snapshot, a creation alert is generated wherein 100 files are created and all the three required conditions are met:
Total number of files > minimum number of files required i.e.125
Baseline for creation = 10; number of files created > Baseline * max deviation
New files created > minimum percent of total files change
Similarly, at the 12th snapshot, modification and deletion alerts are generated as all three required conditions are met for both.
Administrators can take action based on the security policies of the organization to identify and isolate a possible threat and prevent additional losses.
❗ Important
Anomaly detection kicks in only after the backup job is complete and a snapshot is created. For incomplete backup jobs or interrupted backup jobs, no anomalous behavior is tracked.
View Data Anomalies alerts
📝 Note
In the case of deleted resources (devices, sites, and backupsets) you cannot view the alerts for those resources. However, you can retrieve the deleted resources and view their alerts with the Rollback Action option.
Log in to the Management Console and go to Cyber Resiliency > Posture & Observability > Data Anomalies > Anomalies tab to view Data Anomalies details.
Take action on an alert
For any Data Anomalies alert, you can do any of the following:
Ignore the alert : If you deem any alert as a false positive, click the resource name and select the false positive alert. Click Ignore to resolve the alert.
Quarantine the resource: Select an alert and click Quarantine Resource to stop the ransomware from spreading further. Before you quarantine, see Know the impact of quarantining to learn more about the effects of quarantining the resource. To learn about the options to quarantine a resource, see Quarantine Response.
You can also download the logs for a particular alert and use them for further inspection.
📝 Note
For each backup of all workloads, you can download logs for up to 1.5 million files.
The downloaded logs provide information about the following:
File Name: Name of the file
Full Path: Path of the file.
File Type: The type of file. For example, .txt
File Size (Bytes): Size of the file
File Modified Timestamp: The date and time when the file was modified
Operation: The operation performed on the file. For example: File created, file modified, file deleted, files encrypted
SHA1 Checksum (Only for Endpoints, OneDrive, and SharePoint): The SHA1 Checksum value of the file
File Owner (Only for Endpoints, OneDrive, and SharePoint): The details of the file owner
File Created Timestamp (Only for Endpoints, OneDrive, and SharePoint): The date and time when the file was created
File Modified By (Only for OneDrive, and SharePoint): The date and time when the file was modified.
Alert Reason: The reason for encryption alerts.
❗ Important
In case of encryption, the downloaded logs will contain details for a maximum of 100 encrypted files.
After you have taken an action, the status of the alert changes to Resolved.
Related Keywords:
Unusual Data Activity
UDA
unusualdataactivity
Data Anomalies
dataanomalies
data anomaly
Data Anomaly