Data aggregation¶

Plixer Scrutinizer’s SAF (Summary and Forensic) data aggregation method is an optimized system of storing flow data that makes use of summary tables to condense collected information without compromising transparency or accuracy.

How SAF works

With SAF, any incoming flow template with the required data elements is aggregated into a new template definition based on a tuple that includes commonPort. The resulting “summarized” template will omit all data elements that prevent aggregation (e.g., source and destination transport ports) but still contain all information required for the vast majority of reporting needs.

Hint

The aggregation logic used to create summary tables can be modified to suit different scenarios. Contact Plixer Technical Support for assistance.

The data elements retained in the summary tables are but not limited to:

intervalTime
commonPort
ingressInterface
egressInterface
sourceIpAddress
destinationIpAddress
octetDeltaCount
octetDeltaCount_rev
packetDeltaCount
packetDeltaCount_rev
flowDirection
applicationId
protocolIdentifier

Once five 1m summary tables are available, the data averages for the top 1000 (default) conversations are rolled up into 5m tables, and the system continues the rollups to create 30m, 2h, and 12h tables.

Hint

If a Collector’s disk capacity will support it, the Flow Maximum Conversations value under Admin > Settings > Data History can be increased, which may improve reporting accuracy. Since this results in larger tables and certain Report types taking longer to run, it is recommended to gradually increase the value over several days.

Note

When Auto History Trimming (under Data History settings) is enabled, 1m and 5m historical tables are trimmed to maintain the configured Minimum Percent Free Disk Space before Trimming value. Automatic trimming is also used to retain a similar level of historical data for all configured exporters.

Benefits of SAF aggregation

Because the summary tables created under SAF aggregation are drastically smaller in size than regular full-template tables, they benefit the Plixer Scrutinizer system in the following ways:

Reduced disk utilization per table
Increased historical data capacity
Improved report render times
Faster lookups before drilling into forensic data

While only summary data is rolled up into higher interval tables, Plixer Scrutinizer still retains the original forensic data, which is used by a handful of reports that require data elements not included in the summary tables. At the same time, the system also maintains a separate totals table for in/out byte counts per interface to allow for accurate utilization reporting without relying on SNMP.

Note

Systems that have been upgraded from versions prior to 18.x may still use the legacy data aggregation method that was the default in their original installs. To check, navigate to Admin > Settings > Data History and if the Rollup Type is not set to Summary and Forensic, contact Plixer Technical Support for assistance with switching.

Notes on collecting sFlow

When collecting sFlow, packet samples and interface counters should both be forwarded to the collector. Packet samples will be saved to the raw tables, and interface counters will be saved to the totals tables at 1-minute intervals.

Important

Having an sFlow-exporting device (e.g., switch) that sends multiple templates for different flows may result in overreporting, if the flows contain the same or very similar information. Plixer Scrutinizer’s frontend will run reports using data from all templates that match the information. To avoid this, use filters to specify a single template.