Re: Forwarder throughput - Metrics vs TSTATS

john_glasscock · ‎05-21-2018

We are having issues with a OPSEC LEA connector. The Checkpoint firewall is showing say 5,000,000 events per hour.

Using Metrics from Splunk;

index=_internal host="splunk-fwd-1 component=Metrics
| stats sum(ev) as Total
| eval Total_Events=round(Total)
| fields - Total
| fieldformat Total_Events=tostring(Total_Events,"commas")

Shows 5,500,000 events for the time frame process by the forwarder.

Using TSTATS;
| tstats count where index=checkpoint by host,_time span=1m
| search splunk_forwarder=splunk-fwd-1
| chart sum(count) AS Total_Event_Count
| fieldformat Total_Event_Count=tostring(Total_Event_Count,"commas")

Shows 3,000,000 events for the time frame indexed

Where are the events? We see that using index data that the event count vs what is seen on the firewall is significant less than expected. Even if we go back say a month, it isn't like the events are delay coming in. Are we really losing 2 million events per hour?

muebel · ‎05-21-2018

Hi john, are you able to narrow down the metrics search to only the checkpoint logs? That forwarder might not be doing much else, but it looks like the metrics search is counting all events the forwarder is processing.

Otherwise, do you see any blocking or queue-fill issues on the indexer? If it can't keep up it'll start dropping events, which could potentially explain this.

Finally, do you get any warnings or messages when you run the tstats search? One other issue here could be with splunk having too many events on the same subsecond timestamp. I think it'll usually warn about this if it happens though.

Please let me know if this helps!

john_glasscock · ‎05-21-2018

Thank you for responding,

We only have 1 firewall feeding that connector.

How can I see the information on the indexers being blocking or queue-fill issues? We have a lot of indexers. I don't have full admin rights, but can poke around with some searches.

I did not get any warnings or messages when I ran the TSTATS command.

muebel · ‎05-21-2018

Searching the internal index for messages that mention "block" might turn up some events.

These pages have some more info:
https://docs.splunk.com/Documentation/Splunk/7.1.0/Troubleshooting/Aboutmetricslog

If you have the monitoring console setup, and have access to it: http://docs.splunk.com/Documentation/Splunk/7.1.0/Troubleshooting/Troubleshootindexingperformance

Also, you'll want to search for messages like this just to be sure:
"Error in 'IndexScopedSearch': The search failed. More than 125000 events found at time 1293916026."
https://answers.splunk.com/answers/10299/the-search-failed-more-than-125000-events-found-at-time.htm...

john_glasscock · ‎05-21-2018

I wish I had the monitoring console access. Unfortunately I don't have full access but trying to help others that do.

I did search for Blocked or indexscopedsearch and didn't come back with anything really useful.

I have found a huge difference in the numbers between Metrics and TSTAT as far as EPS and Total event are concerned for the forwarder each hour or day.

Thanks again for trying to help. I will I had admin access, but I don't. 😞

Forwarder throughput - Metrics vs TSTATS

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

Join the Conversation