Hi,
We had received an " DMC Alert - Total License usage is near the daily quota" from the license manager and this alert was triggered suddenly at 11:30 pm EST time, though we had enough license capacity to handle data volume of more than 550 GB per day. We were not sure what caused the sudden spike.
Question :
1) How to identify what caused this spike in log volume ?
Kindly guide me know on this issue.
A good start would be to find out which index had a spike:
You can run something like this over the day you had the spike, see which indexed used licence the most,
index=_internal source=*license_usage.log* type="Usage"
| eval b=b/1024/1024/1024
| timechart useother=f span=30m sum(b) as b by idx limit=0
A good start would be to find out which index had a spike:
You can run something like this over the day you had the spike, see which indexed used licence the most,
index=_internal source=*license_usage.log* type="Usage"
| eval b=b/1024/1024/1024
| timechart useother=f span=30m sum(b) as b by idx limit=0
hey I had used the below query to find out which sourcetype/index has highest consumption of license for past 1 hour.
index=_internal source=license_usage.log type="Usage" idx=*
| stats sum(gb) as Totalcount by st,idx | sort - Totalcount | eventstats sum(Totalcount) as SUM | eval P=round(((Totalcount/SUM)*100),2)|eval Percentage=P+" "+"%" | table st idx Totalcount SUM Percentage
Based on this query output, we could see that sourcetype = opsec, sourcetype = infoblox:dns and index= firewall and network has consumed more than 36 % and 30 % of total license volume for one hour duration.
what will be next step to find out from the above sourcetype/indexes information. Could you please guide me on this.
I think I would then continue either from the license usage, or by simple tstats search on the respective index(es), to figure out whether perhaps it is a certain specific data source (host) spiking. If so, you could check with the team managing that firewall / infoblox appliance to see if something is wrong and further investigate.
You could for example query the license usage of a common field per sourcetype/index
your sourcetype
| eval b=len(_raw)
| timechart span=1h avg(b) as b by yourfieldname
| foreach * [eval <>=round('<>'/1024/1024/1024,10)]
hey I had executed the above mentioned query and I am getting results based upon the sourcetype and the field name, but the value seems to be confusing, so it would be real helpful if you can get me a query to fetch the information in MB or GB format instead of bytes.
sourcetype=opsec
| eval b=len(_raw)
| timechart span=1h avg(b) as b by host
| foreach * [eval <>=round('<>'/1024/1024/1024,10)]
You can check license usage by host directly from the license usage logs (h field contains the host value), much faster than with using len(_raw). And as I mentioned above: now that you found which index/sourcetype is spiking, just continue searching for spikes in event count, rather than focusing on size / license usage.
I had used the below query to find out the event count details from the two sourcetype
opsec/ infoblox:dns and found event count details
Query 1:
index=_internal source=license_usage.log type="Usage" idx=* st="infoblox:dns" | stats sum(gb) as Totalcount by st,idx | sort - Totalcount | eventstats sum(Totalcount) as SUM | eval P=round(((Totalcount/SUM)*100),2) | eval Percentage=P+" "+"%"
Total Number of events =3856 for a 1 hour duration
Query 2:
index=_internal source=license_usage.log type="Usage" idx=* st="opsec" | stats sum(gb) as Totalcount by st,idx | sort - Totalcount | eventstats sum(Totalcount) as SUM | eval P=round(((Totalcount/SUM)*100),2) | eval Percentage=P+" "+"%"
Total Number of events =1475 for a 1 hour duration
Kindly correct me if this is not what you had comment above.
I don't really follow what you are doing there, but sounds like you are counting the number of license usage events?
You want to count the actual events of course 😉
So for instance to plot the event count per host over time:
| tstats count where index=firewall sourcetype=opsec by _time,host | timechart sum(count) by host
thanks Frank, for providing the query to fetch the event count based on per host. Based on that, I could see that two of the host has maximum event count for a duration of 1 hour.
For the sourcetype: opsec index=firewall
test01fw = 190845
For the sourcetype : infoblox:dns and index=network
host01 = 217078
These two host had maximum spike from the visualization chart.
What should be next level of investigation should be done to find out the cause for this spike in log volume.
As mentioned before: I think this would be the moment to reach out to whoever manages those checkpoint and infobox devices, show them the event spike and ask them to help investigate what is going on.
And potentially you may want to run this through your companies security incident process, as this could of course potentially be some kind of attack going on.
While it is indeed a way to delve into more detail on the license usage, that len(_raw)
is pretty slow, especially if you're analyzing a big spike in event volume. I'd strongly suggest looking for spikes in event count first, which can be done much more efficiently. I'd be surprised if that wouldn't allow you find the issue.
Frank, how to use the tstats command for fetching the details which had commented in your previous comment.
"by simple tstats search on the respective index(es), to figure out whether perhaps it is a certain specific data source (host) spiking"
It will be great if can give me SPL query using tstats command.
And to check it by sourcetype, use by st
instead of by idx
, for source use by s
, for host use by h
.
That should give you a good indication what data source spiked, then you can start digging deeper by looking at the actual data.
Frank, I had used the below query to find out which sourcetype/index has highest consumption of license for past 1 hour.
index=_internal source=license_usage.log type="Usage" idx=*
| stats sum(gb) as Totalcount by st,idx | sort - Totalcount | eventstats sum(Totalcount) as SUM | eval P=round(((Totalcount/SUM)*100),2)|eval Percentage=P+" "+"%" | table st idx Totalcount SUM Percentage
Based on this query, we could see that sourcetype = opsec, sourcetype = infoblox:dns and index= firewall and network has consumed more than 36 % and 30 % of total license volume.
what will be next step to find out from the above sourcetype/indexes information. Could you please guide me on this.