Splunk Search

Is it bad practice to have fields with high values in them?

SplunkTrust
SplunkTrust

I'm going through the limits.conf specs to see what the defaulted fields are and noticed that the default for max values for a field is set to 10,000 before it truncates the rest. We currently have a few fields that have more than 10,000 values in them such as JsessionID and the GUID which is a unique identifier tied to the web service request and response.

So my question is, is this bad practice to extract fields with high values? Will this slow our Verbose searches down if we had these compared to not having these fields?

0 Karma
1 Solution

Influencer

Hi,

Which command are you referring to? you'll see in the limits.conf.spec the stanza name above the maxvalues key you are looking at. If you're talking about stats, that refers to the function stats values(foo) - the actual row limit is much higher. So running

| stats count(foo) by GUID

won't truncate after 10,000 rows. If your searches are being truncated you will see a warning label on the job inspector in the search UI.

What's your use case for the verbose searches? I usually avoid them - they consume both memory and disk space in the dispatch directory. And they are slow.

View solution in original post

0 Karma

Influencer

Hi,

Which command are you referring to? you'll see in the limits.conf.spec the stanza name above the maxvalues key you are looking at. If you're talking about stats, that refers to the function stats values(foo) - the actual row limit is much higher. So running

| stats count(foo) by GUID

won't truncate after 10,000 rows. If your searches are being truncated you will see a warning label on the job inspector in the search UI.

What's your use case for the verbose searches? I usually avoid them - they consume both memory and disk space in the dispatch directory. And they are slow.

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

After looking in the limits.conf specs I must have misread that number, it's actually 100,000 values per field before it truncates. This is much better but still could cause an issue down the road. Can you verify that I'm understanding this correctly?

[anomalousvalue]
maxresultrows = 50000
# maximum number of distinct values for a field
maxvalues = 100000

My manager is sold on the idea that our GUID and JSESSION fields which have tons of values are slowing the searching down if it's put in verbose mode. I guess a good way of testing this would be to search a similar result set with no fields having many values than do another search with a field having many values then compare the search times.

0 Karma

Influencer

Yes you are understanding that correctly.

Your manager is correct about verbose mode slowing down the search. Why are you using it? There may be a better way to achieve what you are tying to do. It might be helpful if you post your full search as well

0 Karma

SplunkTrust
SplunkTrust

So if I specify a timeframe which has around 30,000 values per field then that would NOT slow down the search right?

I know he uses verbose mode to enable field discovery and I've pointed out many times that smart-mode does the same thing and is faster but I think it's habit for him to flip between fast and verbose mode. I do use verbose from time to time if I'm using a ...|stats command when I need to view the events, but I agree that it's not ideal to use it when smart-mode is available

0 Karma

Influencer

Can you post the search you are proposing to run? I'm still not 100% clear which commands you want to run and hence what limit you might run up against.

Also, it's always worth checking the job inspector. It will show you where the time is being taken in the queries you are running

0 Karma