Storage

The Admin > Resources > System Performance page of the web interface summarizes disk utilization for individual collectors in a Plixer Scrutinizer environment. A more detailed view that shows actual and expected storage use for historical flow data can also be accessed by drilling into a specific collector.

This section discusses the main factors that influence a Plixer Scrutinizer collector’s disk use and provides instructions for anticipating additional storage needs.

Data retention

Plixer Scrutinizer’s data history settings (under Admin > Settings in the web interface) can be used to adjust how long Plixer Scrutinizer stores aggregated flow data, alarm/event details, and other information. With the default settings, a collector provisioned with the minimum 100 GB of storage can store up to 30 days of NetFlow V5 data for a maximum of 25 flow-exporting devices with a combined flow rate of 1,500 flows/s.

Note

Plixer Scrutinizer’s functions are highly I/O intensive, and there are many factors that can impact the system’s disk-based performance, such as the size/complexity of flows being received and flow cardinality. To ensure optimal performance, 15k HDDs or SSDs in a RAID 10 are recommended.

To determine the expected amount of disk space that will be used by a collector for a distinct data retention configuration, at the current exporter count and total flow rate, follow these steps:

  1. In the web interface, make the necessary edits to the data retention settings in the Admin > Settings > Data History tray and click Apply.

  2. Navigate to Admin > Resources > System Performance and click on the collector address in the Active Collectors table to open the Disk Utilization view.

  3. Review the HD Utilization graph and Utilization per Interval summary table to compare the current disk space used for each historical flow data interval against the predicted/expected utilization based on the configured data history settings (e.g., X hours of 1-minute averages, Y days of 12-hour averages, etc.).

  4. If necessary, make adjustments to the configured data history settings and/or the collector’s storage allocation.

    Note

    The predicted values in the Disk Utilization view will automatically be updated to reflect any changes applied to Plixer Scrutinizer’s data history settings.

Plixer Scrutinizer automatically trims older historical flow data when available disk space falls below the Minimum Percent Free Disk Space Before Trimming value configured in the data history settings. This behavior can be disabled by unticking the Auto History Trimming checkbox, but flow collection and other functions may be paused when available storage runs low. The amount of storage for the collector can also be increased to retain older records.

Hint

The Data Retention graph shows the number of days of historical flow data currently saved compared against the total number of days that will be retained based on the current data history settings.

Host indexing

When enabling host and host to host indexing, it may become necessary to allocate additional disk space, CPU cores, and RAM to Plixer Scrutinizer collectors.

Host to host indexing can have a significant impact on disk utilization, because the database will include records for two types of host pairs:

  • Continuously active pairs, for whom records will not expire

  • Ephemeral unique pairs, for whom records will expire but are also replaced at approximately the same rate

Disk space calculations

To approximate the amount of additional disk space that will be used by the host to host index:

  1. Create/run a new a Host to Host pair report and add all Exporters that were defined as inclusions for the Host Indexing FA algorithm.

  2. Set the time window to cover a period of at least 24 hours.

  3. When the output of the report is displayed, click the gear button to open the Options tray and select Global.

  4. In the secondary tray, select the 5m option from the Data Source dropdown and click Apply before returning to the main view.

  5. Note the total result count, which will be roughly equivalent to the number of active pairs.

  6. Return to the Options > Global tray and switch to the 1m data source option.

  7. Subtract the previous result count from the updated total result count to determine the number of ephemeral pairs.

After obtaining the active pair and ephemeral pair counts, the following formula can be used to calculate additional disk space requirements for host to host indexing:

(Active pair count + Ephemeral pair count) * Exporter count * 200 B

where Exporter count corresponds to the total number of Exporters/inclusions defined for the Host Indexing algorithm.

Utilization alerts

If the combined disk space used by the host and host pair databases reaches 100% of the Host Index Max Disk Space setting of the Host Indexing algorithm, host and host to host indexing will be suspended until storage becomes available again.

The following Alarm Policies are used to alert users to high disk utilization by host indexing:

Host Index Disk Space Warning

Triggered when the disk space used by host indexing functions reaches/exceeds 75% of the specified Host Index Max Disk Space

Host Index Disk Space Error

Triggered when host indexing functions are suspended because the Host Index Max Disk Space has been reached

Host Index Disk Availability Error

Triggered when host indexing functions are suspended because disk utilization for the volume the host and host pair databases are stored on has reached/exceeded 90%