Sizing your environment¶
A single Plixer Scrutinizer collector instance can scale up to 100,000 fps sustained with spikes up to 200,000 fps, collecting from up to 500 exporters per collector. A distributed cluster can scale up to 50 collectors. That allows for sustained 5Mfps (spikes to 10Mfps) from up to 25000 exporters.
|Flows per second||Flows per minute||Flows per hour||Flows per day|
Processing 8.64 billion records a day will naturally require more than our minimum system specifications would allow for. This document will help you determine what resources are required.
Keep in mind there are many more factors than are outlined here, therefore requirements for some instances will vary.
This section contains information on Plixer Scrutinizer sizing.
Our minimum system specifications are based on a max of 5kfps and 25 exporters. As system loads increase required resources will increase.
The big three¶
The big three variables are: CPUs, Memory, and Disk. Processing huge data volumes requires large amounts of all.
CPU: The requirements for CPU most closely correlates with the number of exporters coming in.
Memory: The requirements for memory most closely correlate with flow volume.
Disk: Disk IO closely correlates with flow rate. Disk size requirements will be a function of an organizations data retention needs.
All recommendations are based off of DEDICATED resources and that SHARED CPUs, RAM, and disk may not perform up to the recommended levels.
If you are streaming to ML or an external data lake, collected flow rates will be 25% less or CPUs and RAM need to be 25% higher.
Single instance guidelines¶
These sizing guidelines reflect the resources necessary for core functionality: data collection, aggregation, reporting, and the user interface.
Optional functionality requires additional resources on top of the sizing matrices below
- Streaming to ML or an external data lake: Collected flow rates will be 25% less or CPUs and RAM need to be 25% higher.
- Host Indexing requires an additional 4 cores and 4 GB of RAM
- Scanning algorithms require an additional 4 cores and 4 GB of RAM
- Non-Scanning algorithms require an additional 4 cores and 4 GB of RAM
- Pre-cooking top conversations requires an additional 4 cores and 4 GB of RAM
Plixer Scrutinizer is an IO intensive product. We recommend 15K drives or SSDs in RAID 10 for the best performance.
With a distributed Scrutinizer deployment a number of servers work in concert. The reporter(s) act as the coordinator for all servers and therefore require resources in proportion with the number of servers.
Minimum CPUs: 2x servers in a cluster
Recommended CPUs: 4x servers in a cluster
Minimum Memory: 2GB per server in a cluster
Recommended Memory: 4 GB per server in a cluster
Take for example a distributed cluster with 10 collectors plus a dedicated reporter where the reporter is not collecting any external flow data. That reporter still has minim specs of 20 cores (we recommend 40) and 20 GB of RAM (we recommend 40 GB).
Disk IO: In virtualized environments disk configurations and performance characteristics can vary greatly. Plixer Scrutinizer is a disk intensive application and avoiding waiting on disk is critical. There are too many factors that go into load on disk:
- Size in bytes of each flow record
- Cardinality of flow data
- Aggregation method selected
Features enabled: Overall load on a system will vary greatly depending on which features are being utilized and at what levels. Some of the features that can impact resource needs are:
- Number of Flow Analytics algorithms enabled and how many data sources are enabled
- Number of configured report thresholds
- Number of scheduled reports
All flows are not the same: Performance will vary greatly depending on the size and complexity of the flows being collected.
- The simplest flow configuration is NetFlow v5, where each flow record is 48 bytes on the wire (excludes headers and Plixer enhancements. Bytes on disk will be different).
- More complex IPFIX templates can be well over 200 bytes per flow and include come complex structures like variable length strings that require more CPU to decode.
Multiple templates matter: Multiple flow templates can add load like an additional exporter would.
- If an exporter is sending the same flows in two templates, for example sending both ingress and egress metered flows, the load on the system for one exporter feels just like two exporters.
- Option templates are small amounts of data sent infrequently so system impact is minimal. Recommended specs assume each exporter will be sending an option template.
- This document uses the measure of “exporter”, because it simplifies things in almost all cases. If an exporter is sending additional template(s) with flow records it is safes to count that exporter as 2+ exporters.
Plixer Machine Learning Engine¶
This section contains information on Plixer Machine Learning Engine sizing.
For PSI an “asset” is a host, for PNI an “asset” is an exporter interface.
Rows are flow per second (FPS), columns are number of assets supported. Measurement in number of cores.
Rows are FPS, columns are number of assets supported. Measurements in GB.
Rows are FPS, columns are number of assets supported. Measurements in TB.