Scanner CLI utility
Updated over a week ago

Use the Scanner CLI utility

The Scanner CLI utility allows you to analyze the file system or NAS shares and get insight into the file and directory structure. Before you run the Scanner CLI utility for the NAS shares, mount the NAS shares manually. This utility is bundled with the Hybrid Workloads agent. So, when you install the latest version of the Hybrid Workloads agent, you get access to the Scanner CLI utility. You can get information, such as the number of folders and files present, directory and file level, and data changed rate. When you run the Scanner CLI utility for the first time, a full scan is performed and all the subsequent scans may be incremental or full based on the configuration parameter specified in the configuration file. You will notice a significant improvement in the incremental scans performed after the first full backup.

You can run the Scanner CLI utility using:

Command line interface procedure

Run one of the following commands:

scanner-cli.exe <Configuration file path>

Or

scanner-cli.exe <Directory path for analysis> <Directory in which to create the output files>

If you use this command, then the default parameters will be used. To override these parameters, you must create a configuration file and run it.

Configuration file procedure

Perform the following:

  1. Create a configuration file in the YAML format by copying the following snippet to a text file and saving the file in the YAML format. Or, you can download the following sample config.yml

    root_paths : [Z:\]
    fset_dir: Z:\
    scan_worker_count: 50
    sqlite_n_conns: 8
    results_threshold: 10000
    results_file: C:\results
    processed_data_file: C:\ProcessedDataFile
    db_file_path: C:\sqlite.db
    use_usn: false
    smart_scan: false
    force_scan: false
    ss_age_threshold: 0
    skip_acl: false
    statemap: false
    log_file: C:\ScannerLog.log

    filters:
    exclude_folders: ["/proc", "/sys", "/dev", "/tmp", "/lost+found", "/etc/Phoenix", "/var/Phoenix", "/selinux", "$Recycle.Bin", "ProgramData", "Recovery", "System Volume information", "RECYCLER",
    "C:\\Program Files (x86)", "C:\\Program Files", "C:\\Windows", ".snapshot"]
    exclude_extensions: ""
    include_extensions: ""

    • root_paths: Specify the absolute or full path of the directories that you want to scan. For NAS, the root path should be the path where the share is mounted.
      Default: [ ]

    • fset_dir: Specify the drive letter.

      • For Windows and CIFS share, the fset directory is '<Drive letter:\>', and

      • For Linux and NFS Share, the fset directory is '/'

      • Default: NA

    • scan_worker_count: The number of threads that are to be used for scanning.
      Default: 50

    • sqlite_n_conns: The number of connections to be established with SQLite.
      Default: 8 (recommended)

    • results_threshold: Minimum batch size using which the output will be displayed on the console when the utility is run.
      Default: 10000 (recommended)

    • results_file: Specify the location of the results file, which will contain information about the changed data. A timestamp is appended to the results file name after each Scanner CLI utility run.
      ResultsFile_<FsetDir>_<Timestamp>

    • use_usn: Windows USN
      Default: false

    • force_scan: Set to 'true' if you want to run a full scan forcefully instead of an incremental scan. The recommended value is 'false'. This is not applicable for the first full scan.
      Default: false

    • skip_acl: Set to 'false' to skip detecting the Access Control Lists (ACLs) changes. This is not applicable for the first full scan.
      Default: false

    • log_file: Specify the location where you want to save the scanner log files.
      Default: ScannerLog_<FsetDir>.log

    • statemap: Set to false.
      Note: This improves scan performance. If you are planning to run an incremental scan after the first full scan, this parameter needs to be set to 'true' for all runs including the first full run.
      Default: false

    • db_file_path: Specify the location of the file which will be used to store the persistent state of the scanner.
      Default: DBFile_<FsetDir>.db

    • exclude_folders: List of folders to be excluded from the scan. For example, exclude_folders: [dev, /proc, /etc, Phoenix]
      Default: ["/proc",
      "/sys", "/dev", "/tmp", "/lost+found", "/etc/Phoenix",
      "/var/Phoenix", "/selinux", "$Recycle.Bin", "ProgramData",
      "Recovery", "System Volume information",
      "RECYCLER", "C:\\Program Files (x86)",
      "C:\\Program Files", "C:\\Windows",
      ".snapshot", ".Snapshot", ".SNAPSHOT"]

    • exclude_extensions: List of file extensions to be excluded from the scan. The extensions must be separated by a semicolon. For example,
      exclude_extensions: "*.log;*.bat"
      Default: " "

    • include_extensions: List of file extensions to be included in the scan. The extensions must be separated by a semicolon. For example,
      include_extensions: include_extensions:
      If a file extension is added to the include list and exclude list, then the file extension will be excluded as the exclusion takes precedence over inclusion.
      Default: " "



      📝 Note
      You can override the value of any of these parameters by adding it to the yaml file. Make sure to also add the “fset_dir: <Specify fset directory path>” in the yaml files, as it is a mandatory parameter.If you do not provide the values for the results_file, processed_data_file, log_file, and db_file_path parameters in the yml file, then the files with the default names will get created at the location from where the utility is run





      📝 Note
      The statistics in the ProcessedDataFile is applicable only in the first run.


  2. Download and install the latest version of the Hybrid Workloads agent from the Downloads page.

  3. In case of Linux, increase the file descriptor (FD) limit by using the following command:
    ulimit -n 65000

Review scan result

Once the scan is complete,

  • A result file is generated at the location specified in the configuration file. This result file contains the following information about the changed data:

    Changed_data_info.png


    ChangeType - Indicates the type of change, such as file added, file modified, file deleted.
    ItemType - Indicates the type of file: 'F' indicates a file, 'D' indicates a directory, and 'L' indicates link.
    Mode - Indicates the Standard OS File Mode (uint32).
    MTime - Indicates the modification time of the file or the folder.
    Size - Indicates the size of the file in bytes.
    Path - Indicates the full path of the file.

  • A log file is generated at the location specified in the configuration file. The output file contains the following telemetry information.

ScannerCLI_New_Output.png
  • An output file (processed_data_file) with the formatted data is generated that contains the following telemetry information.
    Scanned directory: D:\ Include path(s): [] Exclude folders: /proc, /sys, /dev, /tmp, /lost+found, /etc/Phoenix, /var/Phoenix, /selinux, $Recycle.Bin, ProgramData, Recovery, System Volume information, RECYCLER, C:\Program Files (x86), C:\Program Files, C:\Windows, .snapshot, .Snapshot, .SNAPSHOT Exclude extensions: NA Include extensions: NA Summary Total Count (files and folders): 326407 Directories/Folders Count: 60683 Files Count: 265724 Softlink Files Count: 0 Total Size of the files: 16620968402 Bytes, or 15.48 GB Average file size: 62549.74 Bytes, or 61.08 KB Directory modification age distribution: Age distribution Count Count % 0-90 Days 1 25.00 % 90-180 Days 0 0.00 % 180-270 Days 0 0.00 % 270 Days-1 Year 0 0.00 % 1-2 Years 3 75.00 % > 2 Years 0 0.00 % Total Folders Count: 4 File size distribution: Size distribution Count Count % Size Size % Avg Size 0-1KB 94486 35.56 % 30.46 MB 0.19 % 338.06 B >1-10KB 118334 44.53 % 436.61 MB 2.75 % 3.78 KB >10-100KB 44421 16.72 % 1.16 GB 7.47 % 27.29 KB >100KB-1MB 7619 2.87 % 2.19 GB 14.15 % 301.54 KB >1-16MB 760 0.29 % 2.43 GB 15.72 % 3.28 MB >16MB 104 0.04 % 9.24 GB 59.71 % 91.01 MB File modification age distribution: Age distribution Count Count % Size Size % 0-90 Days 80156 30.17 % 3.88 GB 25.06 % 90-180 Days 45794 17.23 % 1.39 GB 9.00 % 180-270 Days 3274 1.23 % 127.01 MB 0.80 % 270 Days-1 Year 10248 3.86 % 1.69 GB 10.92 % 1-2 Years 60932 22.93 % 1.67 GB 10.78 % > 2 Years 65320 24.58 % 6.72 GB 43.43 % Total Files Count: 265724 Extensions list sorted by files count: Large ext: Files with >=5 chars filename extension No ext : files with no extension to the filename File extension Count Count % Size Size % .go 115489 43.46 % 1.95 GB 12.57 % .py 35434 13.33 % 362.03 MB 2.28 % No Ext 30585 11.51 % 459.28 MB 2.90 % .json 19842 7.47 % 603.52 MB 3.81 % .js 9356 3.52 % 85.59 MB 0.54 % Large Ext 8711 3.28 % 222.93 MB 1.41 % .md 3335 1.26 % 18.36 MB 0.12 % .sh 2660 1.00 % 4.86 MB 0.03 % .txt 2495 0.94 % 545.18 MB 3.44 % .html 2466 0.93 % 13.01 MB 0.08 % .png 2407 0.91 % 31.35 MB 0.20 % .mod 2325 0.87 % 441.99 KB 0.00 % .a 1796 0.68 % 295.13 MB 1.86 % .s 1699 0.64 % 6.46 MB 0.04 % .yml 1598 0.60 % 2.23 MB 0.01 % .dat 1280 0.48 % 30.87 MB 0.19 % .h 1279 0.48 % 26.20 MB 0.17 % .lock 1260 0.47 % 205.96 KB 0.00 % .pyc 1169 0.44 % 16.11 MB 0.10 % .rst 1132 0.43 % 4.19 MB 0.03 % .xml 1078 0.41 % 4.98 MB 0.03 % .svg 873 0.33 % 13.64 MB 0.09 % Extensions list sorted by the size of files: File extension Count Count % Size Size % .pack 188 0.07 % 5.13 GB 33.17 % .go 115489 43.46 % 1.95 GB 12.57 % .zip 783 0.29 % 1.24 GB 7.98 % .rar 4 0.00 % 1.17 GB 7.57 % .lib 108 0.04 % 606.59 MB 3.83 % .json 19842 7.47 % 603.52 MB 3.81 % .dll 156 0.06 % 600.24 MB 3.79 % .exe 543 0.20 % 599.50 MB 3.78 % .txt 2495 0.94 % 545.18 MB 3.44 % No Ext 30585 11.51 % 459.28 MB 2.90 % .py 35434 13.33 % 362.03 MB 2.28 % .a 1796 0.68 % 295.13 MB 1.86 % Large Ext 8711 3.28 % 222.93 MB 1.41 % .tgz 45 0.02 % 193.59 MB 1.22 % .pdb 465 0.17 % 193.24 MB 1.22 % .db 3 0.00 % 112.15 MB 0.71 % .idx 188 0.07 % 104.10 MB 0.66 % .bmp 152 0.06 % 98.91 MB 0.62 % .tar 101 0.04 % 97.18 MB 0.61 % .c 703 0.26 % 91.81 MB 0.58 % .js 9356 3.52 % 85.59 MB 0.54 % .so 93 0.03 % 70.69 MB 0.45 % Average Width: 5 (average number of files in each directory) Average Depth: 8 (average directory depth) Maximum Depth: 20 (max directory depth found during the scan) Maximum Width: 3559 (max number of files found in a single directory) Scanning Rate: 9636 (files scanned per second) Scanning Time: 33 (total scan time in seconds)

    • Scanned directory: Directory path for analysis.

    • Include path(s): Path(s) to include under scanned directory.

    • Exclude folders: Shows the folders to be excluded.

    • Exclude extensions: Shows the extensions to be excluded.

    • Include extensions: Shows the extensions to be included.

    • Total Count (files and folders): Shows the total number of files and folders.

    • Directories/Folders Count: Shows the count of all the folders/directories.

    • Files Count: Shows the total count of files.

    • Softlink Files Count: Count of soft link files

    • Total Size of the files: Shows the total size of all the files in the directory.

    • Average file size: Shows the average size of a file in the directory.

    • File size Distribution: Shows the size distribution of files in a backup set. A [0-1KB : 1] indicates that only a single file with a file size between 0 to 1 KB was encountered during the scan.

    • File modification age distribution: Distribution of files according to their modification age.

    • Directory modification age distribution: Distribution of directories according to their modification age.

    • Extensions list sorted by files count: Shows the list of file extensions sorted by file count.

    • Large ext: Shows the files that have greater than or equal to five characters in the filename extensions.

    • No ext: Shows the files that have no extensions to the filename.

    • Extensions list sorted by the size of files: Shows the list of file extensions sorted by file size.

    • Average Width: Shows the average number of files in each directory.

    • Average Depth: Shows the average depth of the directory tree.

    • Maximum Depth: Shows the maximum depth of the directory tree during the scan.

    • Maximum Width: Shows the maximum number of files found in a single directory.

    • Scanning Rate: Shows the rate (in files per second) with which the files were scanned.

    • Scanning time: Shows the total scan duration (in seconds).

Did this answer your question?