Splunk Search

Performance fields/stats/table

bowesmana
SplunkTrust
SplunkTrust

https://community.splunk.com/t5/Splunk-Search/Fields-vs-table-vs-nothing/m-p/498525#M194897

I was looking at a Splunk authored Search

https://research.splunk.com/cloud/042a3d32-8318-4763-9679-09db2644a8f2/

which does exactly the table followed by stats.

table in this case, seems totally unnecessary and due to the transformation would incur a performance cost.

So, specifically in a clustered index environment, how does

 

 

| fields A B C
| stats count by A B C

 

 

work from a data movement POV - clearly the fields will limit the return of fields from the indexers to the SH, but if there is no fields, does the stats run entirely on the SH, with (a) ALL raw data returned from the indexer, or (b) does the indexer only return the fields the stats command is going to use on the SH?

If it is (a) then clearly a benefit in using fields before stats, but my expectations would be that it should work like (b).

 

Labels (2)

gcusello
SplunkTrust
SplunkTrust

Hi @bowesmana,

I'm not sure, but for my knowledge I'm agree with @woodcock: using fields before stats, you limit the data to tranfer from IDX to the SH so you limit the bandwidth and memory occupation and so also you have better performances.

I never used table before stats, but it's a forma mentis of mine and I didn't saw the search you mentioned.

About fields I used it before stats only when an event has many fields otherwise I didn't find a great advantage in using.

And when I have many events I prefer to use DataModels or summary indexes.

Ciao.

Giuseppe

0 Karma

bowesmana
SplunkTrust
SplunkTrust

It would make sense that stats on the IDX will do something similar to "sort" on the IDX, where it will presort its own results before sending to the SH.

I would therefore expect stats on the IDX to perhaps run pre-stats of its own data before returning the split by fields to the SH and therefore "fields" would NEVER be necessary before stats.

... but ... would be nice to get a definitive answer

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @bowesmana,

I agree with you, only someone from Splunk can answer to your question.

Ciao.

Giuseppe

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@gcusello 

Did some tests and looking at Job inspector phase0 for litsearch, it tells what is going one

so with the basic search

index=x
| table rulename
| stats count by rulename

Job inspector reports

litsearch index=x | ifields + rulename | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "rulename" | prestats count by rulename

replacing table with fields gives

litsearch index=x | fields + rulename | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "rulename" | prestats count by rulename

and with neither, you get

litsearch index=x | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "rulename" | prestats count by rulename

so, very minor differences, but all doing prestats and returning only the restricted field list.

 

gcusello
SplunkTrust
SplunkTrust

hi @bowesmana,

what'd the different job time in the different searches?

As I said the main difference I found is between fields and neither when there are many events and many fields, otherwise I found little differences and I usually don't use it.

Ciao.

Giuseppe

0 Karma

bowesmana
SplunkTrust
SplunkTrust

job time variation was insignificant - but I'm not testing it on a large data set or with index clustering - I'll do that at some point

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...