Vitals reporting

The Vitals reports provide insight on the health of the Scrutinizer servers (e.g. CPU, Memory usage, Hard drive space available, Flow Metrics, etc.). Vitals information is reported for all servers in a Distributed Environment.

Vitals reports can provide valuable insight into the servers’ performance. As with any other flow report type, thresholds can be set on any of the Vitals reports, providing the ability to alert on threshold violations (ie. low disk space, high cpu utilization, etc.)

These reports are accessible at Status->Device Explorer->Scrutinizer server (>Reports->Vitals. (A Vitals Dashboard is also created by default for the Admin user and includes many of the reports listed below.)

  • % CPU per Process: This report displays CPU percentage consumed per process on the server.
  • CPU: Average CPU utilization for the Scrutinizer server(s).
  • CrossCheck Runtime: Monitors runtimes for CrossCheck methods (processes).
  • Database: Provides the following database metrics:
    • Connections by Bytes: Excessive connections can result in reduced performance. NOTE: other applications using the same database will cause this number to increase.
    • Read Req: The number of requests to read a key block from the cache. A high number requested means the server is busy.
    • Write Req: The number of requests to write a key block to the cache. A high number of requests means the server is busy.
    • Cache Free: The total amount of memory available to query caching. Contact support if the query cache is presently under 1MB.
    • Queries: Tracks the number of queries made to the database. More queries indicates a heavier load to the database server. Generally, there will be spikes at intervals of 5 minutes, 30 minutes, 2 hours, 12 hours, etc. This indicates the rolling up of statistics done by the stored procedures. This Vitals report is important to watch if the NetFlow collector is sharing the database server with other applications.
    • Threads: Threads are useful to help pass data back and forth between Scrutinizer and the database engine. The database server currently manages whether or not to utilize the configured amount of threads.
    • Buffers Used: Key Buffers Used - indicates how much of the allocated key buffers are being utilized.

If this report begins to consistently hit 100%, it indicates that there is not enough memory allocated. Scrutinizer will compensate by utilizing swap on the disk. This can cause additional delay retrieving data due to increased disk I/O. On resource strapped implementations, this can cause performance to degrade quickly. Users can adjust the amount of memory allocated to the key buffers by modifying the database configuration file and adjusting the key buffer size setting.

A general rule of thumb is to allocate as much RAM to the key buffer as possible, up to a maximum of 25% of system RAM (e.g. 1GB on a 4GB system). This is about the ideal setting for systems that read heavily from keys. If too much memory is allocated, the risk is seeing further degradation of performance because the system has to use virtual memory for the key buffer. The check tuning interactive scrut_util command can help with recommended system settings.

  • Distributed Heartbeat and Distributed Synchronization: provide further insight into internal communications in a Distributed environment.
  • FA Counts and FA Times provide metrics on the processing of Flow Analytics Algorithms. FA Times is useful in managing FA algorithms not coming to successful completion.
  • Flow Metrics/Exporter and Flow Metrics/Port display metrics by exporter and also by listening port for:
    • MFSN: Missed Flow Sequence Numbers are generated if the device exporting the flows can’t keep up with the traffic, the flow packets are being dropped by something on the network, or the flow collector can’t keep up with the rate of flows coming in. Sometimes MFSN will show up as 10m or 400m. To get the dropped flows per second, divide the value by 1000ms. A value of 400m is .4 of a second. 1 / .4 = 2.5 second. A flow is dropped every 2.5 seconds or 120 (i.e. 300 seconds/2.5) dropped flows in the 5 minute interval displayed in the trend.
    • Packets: Average Packets per second.
    • Flows: Average Flows per second: This is a measure of the number of conversations being observed. There can be as many as 30 flows per NetFlow v5 packet (i.e. UDP datagram) and up to 24 flows per NetFlow v9 datagram. With sFlow, as many as 1 sample (i.e. flow) or greater than 10 samples can be sent per datagram.
  • Memory: displays how much memory is available after what is consumed by all programs on the computer is deducted from Total Memory. It is not specific to NetFlow being captured. The flow collector will continue to grab memory depending on the size of the memory bucket it requires to save data and it will not shrink unless the machine is rebooted. This is not a memory leak.
  • Report Request Time, Report Type Data Time, and Report Type Query Time provide reporting performance metrics.
  • Storage: displays the amount of disk storage space that is available. After an initial period of a few weeks/months, this should stabilize providing that the volume of NetFlow stays about the same.
  • Syslogs: The following metrics are available with the syslogs report:
    • Syslogs Received: The average number of syslogs received per second.
    • Syslogs Processed: The average number of syslogs processed per second.
  • Task Runtime displays runtimes per Scrutinizer automated tasks such as nightly history expiration, vitals data collection, etc.
  • Totals/Rollups Times shows time durations for totals, rollups, and data inserts in the database per flow template per exporter.