We are having issues with a OPSEC LEA connector. The Checkpoint firewall is showing say 5,000,000 events per hour.
Using Metrics from Splunk;
index=_internal host="splunk-fwd-1 component=Metrics
| stats sum(ev) as Total
| eval Total_Events=round(Total)
| fields - Total
| fieldformat Total_Events=tostring(Total_Events,"commas")
Shows 5,500,000 events for the time frame process by the forwarder.
| tstats count where index=checkpoint by host,_time span=1m
| search splunk_forwarder=splunk-fwd-1
| chart sum(count) AS Total_Event_Count
| fieldformat Total_Event_Count=tostring(Total_Event_Count,"commas")
Shows 3,000,000 events for the time frame indexed
Where are the events? We see that using index data that the event count vs what is seen on the firewall is significant less than expected. Even if we go back say a month, it isn't like the events are delay coming in. Are we really losing 2 million events per hour?
Hi john, are you able to narrow down the metrics search to only the checkpoint logs? That forwarder might not be doing much else, but it looks like the metrics search is counting all events the forwarder is processing.
Otherwise, do you see any blocking or queue-fill issues on the indexer? If it can't keep up it'll start dropping events, which could potentially explain this.
Finally, do you get any warnings or messages when you run the tstats search? One other issue here could be with splunk having too many events on the same subsecond timestamp. I think it'll usually warn about this if it happens though.
Please let me know if this helps!
Thank you for responding,
We only have 1 firewall feeding that connector.
How can I see the information on the indexers being blocking or queue-fill issues? We have a lot of indexers. I don't have full admin rights, but can poke around with some searches.
I did not get any warnings or messages when I ran the TSTATS command.
Searching the internal index for messages that mention "block" might turn up some events.
These pages have some more info:
If you have the monitoring console setup, and have access to it: http://docs.splunk.com/Documentation/Splunk/7.1.0/Troubleshooting/Troubleshootindexingperformance
Also, you'll want to search for messages like this just to be sure:
"Error in 'IndexScopedSearch': The search failed. More than 125000 events found at time 1293916026."
I wish I had the monitoring console access. Unfortunately I don't have full access but trying to help others that do.
I did search for Blocked or indexscopedsearch and didn't come back with anything really useful.
I have found a huge difference in the numbers between Metrics and TSTAT as far as EPS and Total event are concerned for the forwarder each hour or day.
Thanks again for trying to help. I will I had admin access, but I don't. 😞