Splunk Search

How to Optimize Splunk Search to avoid "auto-finalized after disk usage limit of 100MB"

nateNpgh
Loves-to-Learn Lots

When running the following search for a 24hr period it is always being auto-finalized due to disk usage limit of 100MB.

index="app_ABC123" source="/var/abc/appgroup123/logs/app123/stat.log" | stats count as TotalEvents by TxId | sort TotalEvents desc | where TotalEvents > 100

Is there any way for me to optimize the search so that it doesn't hit the limit?

Labels (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well. As the message says, you hit your disk quota limit because in some part you simply had too much data to process. It's very likely that it's some non-streaming command.

In your case there are two quite obvious things that can be done.

Firstly, since you only do count over TxId, you can limit your fields processed after the initial search to just that one field. No point dragging the rest of the event further down the pipeline.

And secondly - limit first, sort second. This way you won't have to sort so much data.

index="app_ABC123" source="/var/abc/appgroup123/logs/app123/stat.log"
| fields TxId
| stats count as TotalEvents by TxId
| where TotalEvents>100
| sort TotalEvents desc

 

nateNpgh
Loves-to-Learn Lots

Thanks for the suggestions, unfortunately I still hit the limit with this approach. 

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Can you show sample TxId values? Those cannot contain any major breakers like “
0 Karma

nateNpgh
Loves-to-Learn Lots

nateNpgh_0-1639159687985.png

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
And how those are presented on _raw? Are TxId in there or is it an extracted field?
0 Karma

nateNpgh
Loves-to-Learn Lots

I created an extract field, but they are also in there like that...

nateNpgh_0-1639167042318.png

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
If I have understood right this is not working with extracted field as it expect to find key=value from event data. Btw. PREFIX has added on 8.x version, but you probably have it?
0 Karma

nateNpgh
Loves-to-Learn Lots

we are running Version:7.3.7.1

So that likely explains it.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Then TERM will work, but not PREFIX and PREFIX is what you are needing as you have uncountable / unknown amount/values for TxId.
Anyhow you can try this after you have updated to 8.1+ (currently first supported version).
r. Ismo
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, there is also a possibility that you have so much data...

Try to run your search separately for few single hours and check the number of results and storage usage.

0 Karma

nateNpgh
Loves-to-Learn Lots

The search will completely run over a smaller time window, but was hoping to be able to run it for a 24 hr period.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Maybe one more.

Add "TxId = *" to your first line, then it get only events which this field with any value. And if there are a lot of events then add "sort 0 Total..." as sort has event count limit.

Then you can also try with tstats and TERM + PREFIX if those

| tstats count as TotalEvents where index=app_ABC123 source=/var/abc/appgroup123/logs/app123/stat.log TERM(TxId=*) by PREFIX(TxId=)
| where TotalEvents > 100
| sort 0 TotalEvents desc

The last one is definitely the most efficient if you can get it to work (it should).

I just tested those three with _internal and metrics.log (last 24h in my workstation) and results were 

4.874 vs 4.715 vs 0.583s

r. Ismo

More about stats with TERM and PREFIX can found from conf presentations PLA1089C and TRU1133B.

nateNpgh
Loves-to-Learn Lots

Thanks for the suggestion!  I tried this, and it ran very fast without errors, but the results returned 0 statistics.  I know there are definitely TxId's that have move than 100 events during the search timeframe.

Capture.JPG 

0 Karma
Get Updates on the Splunk Community!

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Your Next Big Security Credential: No Prerequisites Needed We know you’ve got the skills, and now, earning the ...

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

This is the sixth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Answers Content Calendar, July Edition I

Hello Community! Welcome to another month of Community Content Calendar series! For the month of July, we will ...