Sizing considerations

With the minimum system requirements, a single Plixer Scrutinizer server/Collector is capable of processing up to 5,000 flows/s across 25 Exporters. To support larger and/or more complex workloads, such as a higher Exporter count or the use of advanced features, additional resources will need to be allocated to the system.

The optimal allocation for a specific scenario will depend on a large number of unique variables, but the main factors influencing resource requirements (CPU, memory (RAM), disk/storage) are the following:

  • Flow rate and volume

  • Flow contents/types

  • Exporter count

  • Data retention and aggregation settings

  • Number of features and advanced functions enabled

Important

When allocating resources to Plixer Scrutinizer deployments, dedicated rather than shared resources are recommended to ensure optimal performance.

Data retention

Plixer Scrutinizer’s data retention settings (under Admin > Settings > Data History) provide control over how much historical data is kept by the system in terms of duration and/or disk space utilization.

This page includes retention settings for the following data elements:

  • Alarms (days and disk size)

  • Audit logs (months)

  • DNS request data (days)

  • Conversation data (hours, days, or weeks, depending on interval)

  • Top conversations (count)

  • Free space threshold (percentage)

Adjusting these values to match actual data retention needs as closely as possible will allow for more efficient resource allocation and help ensure optimal system performance.

Data aggregation

Plixer Scrutinizer stores the flows it collects in their original form in one-minute (1m) buckets. These 1m records are then “rolled up” or aggregated into higher intervals (1m -> 5m -> 30m -> 2h -> 12h) to allow for faster long-term trending.

There are two data aggregation modes that control how data is saved and rolled up:

  • Traditional - Every element in the original flow template will be copied over to the higher interval templates, which takes more disk space.

  • SAF - Any flow template with the required information elements will be aggregated into a new template definition containing only common elements (srcIP, dstIP, bytes, packets, etc.), allowing for more common Reports (e.g., country, IP Group, and AS by IP, which are based on src/dst IPs) to be run while storing data more efficiently.

To learn more about data aggregation modes, see the section on data aggregation.

Note

When available storage drops below a 10% threshold, 1m and 5m historical tables will be trimmed until disk utilization drops back under 90%. Trimming is also automatically used to maintain a similar level of historical data across all configured Exporters.

Feature resource requirements

The following table summarizes the additional resource requirements to support specific features for a single Plixer Replicator server/Collector:

Feature

Resource Requirements

Data streaming to Plixer ML Engine or external data lake

+25% CPU core count and RAM (to maintain the same performance)

Host indexing

+4 CPU core count and +4 GB RAM

Scanning Flow Analytics algorithms

+4 CPU core count and +4 GB RAM

Non-scanning Flow Analytics algorithms

+4 CPU core count and +4 GB RAM

Important

For the best performance 15k drives or SSDs in RAID 10 are recommended.

Additional considerations

When estimating resource allocation for a Plixer Scrutinizer deployment, the following factors should also be taken intoc consideration:

  • Disc I/O - Because Plixer Scrutinizer’s functions are highly disk-intensive, it is critical to avoid I/O bottlenecking.

Important

For the best performance, 15k drives or SSDs in RAID 10 are recommended.

  • Flow types/templates - The size and complexity of flows being received (e.g., NetFlow V5 vs IPFIX) can also have an impact on system performance. Likewise, Exporters sending the same flows in multiple templates will increase system load, so configuring Exporters to use option templates is recommended.