Installation

How to identify what caused the sudden spike in the log volume?

Hemnaath
Motivator

Hi,

We had received an " DMC Alert - Total License usage is near the daily quota" from the license manager and this alert was triggered suddenly at 11:30 pm EST time, though we had enough license capacity to handle data volume of more than 550 GB per day. We were not sure what caused the sudden spike.

Question :

1) How to identify what caused this spike in log volume ?

Kindly guide me know on this issue.

Labels (1)
Tags (2)
0 Karma
1 Solution

damien_chillet
Builder

A good start would be to find out which index had a spike:
You can run something like this over the day you had the spike, see which indexed used licence the most,

index=_internal source=*license_usage.log* type="Usage"
| eval b=b/1024/1024/1024
| timechart useother=f span=30m sum(b) as b by idx limit=0

View solution in original post

damien_chillet
Builder

A good start would be to find out which index had a spike:
You can run something like this over the day you had the spike, see which indexed used licence the most,

index=_internal source=*license_usage.log* type="Usage"
| eval b=b/1024/1024/1024
| timechart useother=f span=30m sum(b) as b by idx limit=0

Hemnaath
Motivator

hey I had used the below query to find out which sourcetype/index has highest consumption of license for past 1 hour.

index=_internal source=license_usage.log type="Usage" idx=*
| stats sum(gb) as Totalcount by st,idx | sort - Totalcount | eventstats sum(Totalcount) as SUM | eval P=round(((Totalcount/SUM)*100),2)|eval Percentage=P+" "+"%" | table st idx Totalcount SUM Percentage

Based on this query output, we could see that sourcetype = opsec, sourcetype = infoblox:dns and index= firewall and network has consumed more than 36 % and 30 % of total license volume for one hour duration.

what will be next step to find out from the above sourcetype/indexes information. Could you please guide me on this.

0 Karma

FrankVl
Ultra Champion

I think I would then continue either from the license usage, or by simple tstats search on the respective index(es), to figure out whether perhaps it is a certain specific data source (host) spiking. If so, you could check with the team managing that firewall / infoblox appliance to see if something is wrong and further investigate.

0 Karma

HeinzWaescher
Motivator

You could for example query the license usage of a common field per sourcetype/index

your sourcetype
| eval b=len(_raw)
| timechart span=1h avg(b) as b by yourfieldname
| foreach * [eval <>=round('<>'/1024/1024/1024,10)]

0 Karma

Hemnaath
Motivator

hey I had executed the above mentioned query and I am getting results based upon the sourcetype and the field name, but the value seems to be confusing, so it would be real helpful if you can get me a query to fetch the information in MB or GB format instead of bytes.

sourcetype=opsec
| eval b=len(_raw)
| timechart span=1h avg(b) as b by host
| foreach * [eval <>=round('<>'/1024/1024/1024,10)]

0 Karma

FrankVl
Ultra Champion

You can check license usage by host directly from the license usage logs (h field contains the host value), much faster than with using len(_raw). And as I mentioned above: now that you found which index/sourcetype is spiking, just continue searching for spikes in event count, rather than focusing on size / license usage.

0 Karma

Hemnaath
Motivator

I had used the below query to find out the event count details from the two sourcetype
opsec/ infoblox:dns and found event count details

Query 1:

index=_internal source=license_usage.log type="Usage" idx=* st="infoblox:dns" | stats sum(gb) as Totalcount by st,idx | sort - Totalcount | eventstats sum(Totalcount) as SUM | eval P=round(((Totalcount/SUM)*100),2) | eval Percentage=P+" "+"%"

Total Number of events =3856 for a 1 hour duration

Query 2:

index=_internal source=license_usage.log type="Usage" idx=* st="opsec" | stats sum(gb) as Totalcount by st,idx | sort - Totalcount | eventstats sum(Totalcount) as SUM | eval P=round(((Totalcount/SUM)*100),2) | eval Percentage=P+" "+"%"

Total Number of events =1475 for a 1 hour duration

Kindly correct me if this is not what you had comment above.

0 Karma

FrankVl
Ultra Champion

I don't really follow what you are doing there, but sounds like you are counting the number of license usage events?

You want to count the actual events of course 😉

So for instance to plot the event count per host over time:

| tstats count where index=firewall sourcetype=opsec by _time,host | timechart sum(count) by host
0 Karma

Hemnaath
Motivator

thanks Frank, for providing the query to fetch the event count based on per host. Based on that, I could see that two of the host has maximum event count for a duration of 1 hour.

For the sourcetype: opsec index=firewall

test01fw = 190845

For the sourcetype : infoblox:dns and index=network
host01 = 217078

These two host had maximum spike from the visualization chart.

What should be next level of investigation should be done to find out the cause for this spike in log volume.

0 Karma

FrankVl
Ultra Champion

As mentioned before: I think this would be the moment to reach out to whoever manages those checkpoint and infobox devices, show them the event spike and ask them to help investigate what is going on.

And potentially you may want to run this through your companies security incident process, as this could of course potentially be some kind of attack going on.

0 Karma

FrankVl
Ultra Champion

While it is indeed a way to delve into more detail on the license usage, that len(_raw) is pretty slow, especially if you're analyzing a big spike in event volume. I'd strongly suggest looking for spikes in event count first, which can be done much more efficiently. I'd be surprised if that wouldn't allow you find the issue.

0 Karma

Hemnaath
Motivator

Frank, how to use the tstats command for fetching the details which had commented in your previous comment.

"by simple tstats search on the respective index(es), to figure out whether perhaps it is a certain specific data source (host) spiking"

It will be great if can give me SPL query using tstats command.

0 Karma

FrankVl
Ultra Champion

And to check it by sourcetype, use by st instead of by idx, for source use by s, for host use by h.

That should give you a good indication what data source spiked, then you can start digging deeper by looking at the actual data.

0 Karma

Hemnaath
Motivator

Frank, I had used the below query to find out which sourcetype/index has highest consumption of license for past 1 hour.

index=_internal source=license_usage.log type="Usage" idx=*
| stats sum(gb) as Totalcount by st,idx | sort - Totalcount | eventstats sum(Totalcount) as SUM | eval P=round(((Totalcount/SUM)*100),2)|eval Percentage=P+" "+"%" | table st idx Totalcount SUM Percentage

Based on this query, we could see that sourcetype = opsec, sourcetype = infoblox:dns and index= firewall and network has consumed more than 36 % and 30 % of total license volume.

what will be next step to find out from the above sourcetype/indexes information. Could you please guide me on this.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...