Storage

The Admin > Resources > System Performance page of the web interface summarizes disk utilization for individual collectors in a Plixer Scrutinizer environment. A more detailed view that shows actual and expected storage use for historical flow data can also be accessed by drilling into a specific collector.

This section discusses the main factors that influence a Plixer Scrutinizer collector’s disk use and provides instructions for anticipating additional storage needs.

Data retention

Plixer Scrutinizer’s data history settings can be used to adjust how long Plixer Scrutinizer stores aggregated flow data, alarm/event details, and other data. With the default settings, a collector provisioned with the minimum 100 GB of storage can store up to 30 days of NetFlow V5 data for a maximum of 25 flow-exporting devices with a combined flow rate of 1,500 flows/s.

For more accurate and detailed projections of disk space requirements based on specific data retention settings, the following database size calculator can be accessed from the data history settings tray:

Database size calculator

The calculator shows both current and predicted disk usage for each historical flow data interval based on the retention times entered. Details are shown by collector, with total predicted usage and total storage currently available also included.

Note

  • More detailed storage utilization information can be accessed by drilling into a collector from the Admin > Resources > System Performance page.

  • Plixer Scrutinizer’s functions are highly I/O intensive, and there are many factors that can impact the system’s disk-based performance, such as the size/complexity of flows being received and flow cardinality. To ensure optimal performance, 15k HDDs or SSDs in a RAID 10 are recommended.

Auto-trimming

Plixer Scrutinizer automatically trims older historical flow data when available disk space falls below the Minimum Percent Free Disk Space Before Trimming value configured in the data history settings.

Auto-trimming can be disabled by unticking the Auto History Trimming checkbox, but flow collection and other functions may be paused when available storage runs low. The amount of storage for the collector can also be increased to retain older records.

Host indexing

When host indexing is enabled, it may become necessary to allocate additional storage, CPU cores, and RAM to Plixer Scrutinizer collectors.

Host to host indexing can have a significant impact on disk utilization due to the two types of records stored:

  • Continuously active pairs, for whom records will not expire

  • Ephemeral unique pairs, for whom records will expire but are also replaced at approximately the same rate

Disk space calculations

To approximate the amount of additional disk space that will be used by the host to host index:

  1. Create/run a new a Host to Host pair report and add all exporters that were defined as inclusions for the Host Indexing FA algorithm.

  2. Set the time window to cover a period of at least 24 hours.

  3. When the output of the report is displayed, click the gear button to open the Options tray and select Global.

  4. In the secondary tray, select the 5m option from the Data Source dropdown and click Apply before returning to the main view.

  5. Note the total result count, which will be roughly equivalent to the number of active pairs.

  6. Return to the Options > Global tray and switch to the 1m data source option.

  7. Subtract the previous result count from the updated total result count to determine the number of ephemeral pairs.

After obtaining the active pair and ephemeral pair counts, the following formula can be used to calculate additional disk space requirements for host to host indexing:

(Active pair count + Ephemeral pair count) * Exporter count * 200 B

where Exporter count corresponds to the total number of exporters/inclusions defined for the Host Indexing algorithm.

Utilization alerts

If the combined disk space used by the host and host pair databases reaches 100% of the Host Index Max Disk Space setting of the Host Indexing algorithm, host and host to host indexing will be suspended until storage becomes available again.

The following alarm policies are used to alert users to high disk utilization by host indexing:

Host Index Disk Space Warning

Triggered when the disk space used by host indexing functions reaches/exceeds 75% of the specified Host Index Max Disk Space

Host Index Disk Space Error

Triggered when host indexing functions are suspended because the Host Index Max Disk Space has been reached

Host Index Disk Availability Error

Triggered when host indexing functions are suspended because disk utilization for the volume the host and host pair databases are stored on has reached/exceeded 90%

Host indexing functions will automatically restart once sufficient storage is available, either due to record expiry or because disk space has been added.