topic Re: Optimize query that hits disk usage limit when computing stats in Splunk Search

Optimize query that hits disk usage limit when computing stats

cmontanari — Wed, 26 Jan 2022 19:08:54 GMT

Hi All,

What I'm trying to do is to have a chart with time on x-axis and percentages by ResponseStatus on y-axis.

To do that I come up with the below Splunk search query:

match some http requests | fields _time,ResponseStatus,RequestName | eval Date=strftime(_time, "%m/%d/%Y") | eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus) | eventstats count as "totalCount" by Date | eventstats count as "codeCount" by Date,ResponseStatus | eval percent=round((codecount/totalCount)*100) | chart values(percent) by Date,ResponseStatus

But it is hitting the disk usage limit (500MB - which I can't increase) for a 10 days interval. And I'd like to be able to have this on a 3/4 months interval.

What I have noticed is that If I only run the match part of the query, I get all the events without hitting any disk limit, which makes me think the problem is with the counting and group by part of the query.

My guess is that Splunk is making the computation by keeping in-memory (or, trying to do so and eventually swapping to disk) the full event message even if I specified the useful fields via the fields command.

Is there any way to either effectively have Splunk ignore all the remaining part of the message or obtain the same result via a different path?

Thanks a lot!

Re: Optimize query that hits disk usage limit when computing stats

yuanliu — Thu, 27 Jan 2022 03:55:49 GMT

Reduce use of eventstats is always good. Secondly, you can use table to reduce event (row) size; fields doesn't do quite that.

In your example, you can eliminate one of eventstats like this

match some http requests | table _time,ResponseStatus,RequestName ``` fields does not reduce row size ``` | eval Date=strftime(_time, "%m/%d/%Y") | eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus) | stats count as "codeCount" by Date,ResponseStatus | eventstats sum(count) as "totalCount" by Date | eval percent=round((codecount/totalCount)*100) | chart values(percent) by Date,ResponseStatus

Re: Optimize query that hits disk usage limit when computing stats

isoutamo — Thu, 27 Jan 2022 07:19:04 GMT

Please remember that when you are replacing fields with table you move processing from indexers to search head! Anyhow stats should remove those additional fields away.
r. Ismo

Re: Optimize query that hits disk usage limit when computing stats

richgalloway — Thu, 27 Jan 2022 19:11:30 GMT

Where did you find out fields does not reduce row size? This contradicts what we've been told over many years.

Re: Optimize query that hits disk usage limit when computing stats

PickleRick — Thu, 27 Jan 2022 13:40:32 GMT

You might explicitly remove _raw early in the process so that you operate only on the set of fields you need without dragging more data around.

| fields - _raw

Furthermore, why do you render the _time to a string? You might use bin/bucket to align the data to full days or whatever time unit you need without creating additional fields.

You might also rethink your eventstats. If you already calculate by date and responsestatus, why calculating by date again? You might use the one you already have and sum it over date.

Re: Optimize query that hits disk usage limit when computing stats

cmontanari — Thu, 27 Jan 2022 20:12:58 GMT

Thank you! Converting that eventstats into stats made it work. The use of table didn't seem to have an impact on performance overall

Re: Optimize query that hits disk usage limit when computing stats

cmontanari — Thu, 27 Jan 2022 20:11:54 GMT

Thank you! Your message made me realize I could write the query in another way. This is where I landed after the changes suggested by @yuanliu + your message:

Most likely further improvements can be made here as well, but as long as it is not hitting the disk threshold I'm happy with it 🙂

Re: Optimize query that hits disk usage limit when computing stats

johnhuang — Thu, 27 Jan 2022 22:57:15 GMT

Could be even simplified further:

Re: Optimize query that hits disk usage limit when computing stats

PickleRick — Thu, 27 Jan 2022 22:21:39 GMT

No. You don't want to do that. Firstly, the removing _raw is not needed. But more importantly, table command is a transforming command and moves further processing to searchhead which kills parallelization.

Re: Optimize query that hits disk usage limit when computing stats

johnhuang — Thu, 27 Jan 2022 22:56:40 GMT

You know, adding table and removing _raw was actually slightly slower in my testing and took up slightly more disk but the difference was too small for me to be sure. You were the one who first recommended removing _raw and someone else here recommended table.

| table _time ResponseStatus | fields - _raw

Re: Optimize query that hits disk usage limit when computing stats

yuanliu — Fri, 28 Jan 2022 09:03:47 GMT

@richgalloway You and @isoutamo are correct. I didn't realize that events and fields are two separate spaces. fields allows events (_raw) to carry on, but search buffers are not burdened by them. As isoutamo points out, table may actually carry higher performance penalty.