With our cyber data, we have cases when streams of data stop, due to a down forwarder, bad DB connection etc. and cases when the streams suddenly increase in volume such as bluecoat cases, dns attack and more.
We would like to alert on these cases without hardcoding the various indexes or sourctypes. We also wonder whether there is a good way to do it in ITSI.
Hello,
Just an idea, but maybe if you are using data models and the CIM, consider scheduling tstats searches against the data models, to count for example, the number of Authentication failures by host over the last 5 minutes. Then you wont care the source or sourcetype, or host. So long as its in the datamodel (likely via the CIM). Something like this, where it looks for the number of failed authentications per host, over the last 5 minutes...
| tstats count from datamodel=Authentication where Authentication.action=failure earliest=-5m@m by _time Authentication.src
|stats sum(count) by Authentication.src
Schedule this every 5min, as it looks back 5 minutes within the datamodel (adjust the duration depending on you acceleration times/etc). Then you'll get a decent number you can alert off of, and it will give you the src (host) having the increase of Authentication failures for you to go investigate. It wont matter what the Authentication source is (dns, remote login, ssh, ftp, etc, etc) so long as it's in the datamodel, you'll get an alert. Just an idea.
Of course you can operate on aggregated metadata (you can do tstats count across all indexes as well) but it will not reliably tell you in changes in single data stream unless it's a big part of your general event stream.
Well, you have to have reasonably identifiable sources. What I mean by that is that you must be able to distinguish between data streams by some set of fields (or transformation/aggregation of some of them). Otherwise you're stuck with - for example - events coming from source=/var/log/messages on a host=localhost.
So it's actually up to you to come up with proper grouping for tstats count (I wouldn't use plain stats for counting the volume of all data).
Then it's just the matter of proper timechart with timewrap and probably some foreach logic