Splunk Search

Optimize query that hits disk usage limit when computing stats

cmontanari
Explorer

Hi All,

What I'm trying to do is to have a chart with time on x-axis and percentages by ResponseStatus on y-axis. 

To do that I come up with the below Splunk search query:

 

 

match some http requests
| fields _time,ResponseStatus,RequestName
| eval Date=strftime(_time, "%m/%d/%Y")
| eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus)
| eventstats count as "totalCount" by Date
| eventstats count as "codeCount" by Date,ResponseStatus
| eval percent=round((codecount/totalCount)*100)
| chart values(percent) by Date,ResponseStatus

 

 

 But it is hitting the disk usage limit (500MB - which I can't increase) for a 10 days interval. And I'd like to be able to have this on a 3/4 months interval.

What I have noticed is that If I only run the match part of the query, I get all the events without hitting any disk limit, which makes me think the problem is with the counting and group by part of the query.

My guess is that Splunk is making the computation by keeping in-memory (or, trying to do so and eventually swapping to disk) the full event message even if I specified the useful fields via the fields command. 

Is there any way to either effectively have Splunk ignore all the remaining part of the message or obtain the same result via a different path?

Thanks a lot!

Labels (4)
Tags (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Reduce use of eventstats is always good.  Secondly, you can use table to reduce event (row) size; fields doesn't do quite that.

In your example, you can eliminate one of eventstats like this

match some http requests
| table _time,ResponseStatus,RequestName ``` fields does not reduce row size ```
| eval Date=strftime(_time, "%m/%d/%Y")
| eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus)
| stats count as "codeCount" by Date,ResponseStatus
| eventstats sum(count) as "totalCount" by Date
| eval percent=round((codecount/totalCount)*100)
| chart values(percent) by Date,ResponseStatus

 

View solution in original post

PickleRick
Ultra Champion

You might explicitly remove _raw early in the process so that you operate only on the set of fields you need without dragging more data around.

| fields - _raw

Furthermore, why do you render the _time to a string? You might use bin/bucket to align the data to full days or whatever time unit you need without creating additional fields.

You might also rethink your eventstats. If you already calculate by date and responsestatus, why calculating by date again? You might use the one you already have and sum it over date.

cmontanari
Explorer

Thank you! Your message made me realize I could write the query in another way. This is where I landed after the changes suggested by @yuanliu  + your message:

match some stuff
| fields _time,ResponseStatus,RequestName
| fields - _raw
| bucket _time span=1d
| eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus)
| eventstats count as "total" by _time 
| stats count first(total) as "total" by _time, ResponseStatus 
| eval percent=(count/total)*100 
| timechart span=1d first(percent) by ResponseStatus

 Most likely further improvements can be made here as well, but as long as it is not hitting the disk threshold I'm happy with it 🙂 

0 Karma

johnhua
Builder

Could be even simplified further:

 

 

<base_search>
| table _time ResponseStatus | fields - _raw 
| bucket _time span=1d
| fillnull value=504 ResponseStatus
| top 100 ResponseStatus by _time showcount=f
| timechart limit=30 span=1d first(percent) by ResponseStatus

 

0 Karma

PickleRick
Ultra Champion

No. You don't want to do that. Firstly, the removing _raw is not needed. But more importantly, table command is a transforming command and moves further processing to searchhead which kills parallelization.

johnhua
Builder

You know, adding table and removing _raw was actually slightly slower in my testing and took up slightly more disk but the difference was too small for me to be sure.  You were the one who first recommended removing _raw and someone else here recommended table.

| table _time ResponseStatus | fields - _raw 

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Reduce use of eventstats is always good.  Secondly, you can use table to reduce event (row) size; fields doesn't do quite that.

In your example, you can eliminate one of eventstats like this

match some http requests
| table _time,ResponseStatus,RequestName ``` fields does not reduce row size ```
| eval Date=strftime(_time, "%m/%d/%Y")
| eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus)
| stats count as "codeCount" by Date,ResponseStatus
| eventstats sum(count) as "totalCount" by Date
| eval percent=round((codecount/totalCount)*100)
| chart values(percent) by Date,ResponseStatus

 

cmontanari
Explorer

Thank you! Converting that eventstats into stats made it work. The use of table didn't seem to have an impact on performance overall 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Where did you find out fields does not reduce row size?  This contradicts what we've been told over many years.

---
If this reply helps you, an upvote would be appreciated.

yuanliu
SplunkTrust
SplunkTrust

@richgalloway You and @isoutamo are correct.  I didn't realize that events and fields are two separate spaces. fields allows events (_raw) to carry on, but search buffers are not burdened by them.  As isoutamo points out, table may actually carry higher performance penalty.

isoutamo
SplunkTrust
SplunkTrust
Please remember that when you are replacing fields with table you move processing from indexers to search head! Anyhow stats should remove those additional fields away.
r. Ismo
*NEW* Splunk Love Promo!
Snag a $25 Visa Gift Card for Giving Your Review!

It's another Splunk Love Special! For a limited time, you can review one of our select Splunk products through Gartner Peer Insights and receive a $25 Visa gift card!

Review:





Or Learn More in Our Blog >>