one day. Some of my universal forwarder have some problems.It sends a lot of duplicate events,On the server, the nginx_access.log file has 20000 logs,But in the indexer, but has 15 million log,The same log even repeated tens of thousands of times.Finally, I think it is the reason why the UFis configured with the useACK=true
parameter.This caused the entire network congestion phenomenon.
So now I want to create an alert to monitor whether the UF repeatedly sends the log.
In the beginning, I was index-based to determine whether the number of events in the previous hour was at least 5 times greater than or equal to the average,If it is equal to or greater than 5 times, I think the log growth rate is abnormal.But the index too much, I think this method is not perfect
index=test earliest=-4h latest=now|stats count by sourcetype | eval avg=(count/4)|rename count as 4h|appendcols [search index=test earliest=-1h latest=now|stats count by sourcetype]|eval result=if(count/avg>=5,"Incremental anomaly","OK")|where result!=OK|table 4h avg count result
I think we should calculate the number of events received by the UF,For example: UF average number of events received is 20000, if the previous hour to receive the number is 1500000, I think the incident may be repeated.
So the question: how do I count the number of each UF, how should I write a search statement?
Is this what you're looking for ?
index="_internal" source =/apps/splunkforwarder/var/log/splunk/metrics.log group=per_host_thruput | timechart span=4h avg(eps) by series
what's is eps?
event per second