Re: Performance fields/stats/table

bowesmana · ‎10-24-2022

https://community.splunk.com/t5/Splunk-Search/Fields-vs-table-vs-nothing/m-p/498525#M194897

I was looking at a Splunk authored Search

https://research.splunk.com/cloud/042a3d32-8318-4763-9679-09db2644a8f2/

which does exactly the table followed by stats.

table in this case, seems totally unnecessary and due to the transformation would incur a performance cost.

So, specifically in a clustered index environment, how does

| fields A B C
| stats count by A B C

work from a data movement POV - clearly the fields will limit the return of fields from the indexers to the SH, but if there is no fields, does the stats run entirely on the SH, with (a) ALL raw data returned from the indexer, or (b) does the indexer only return the fields the stats command is going to use on the SH?

If it is (a) then clearly a benefit in using fields before stats, but my expectations would be that it should work like (b).

gcusello · ‎10-24-2022

Hi @bowesmana,

I'm not sure, but for my knowledge I'm agree with @woodcock: using fields before stats, you limit the data to tranfer from IDX to the SH so you limit the bandwidth and memory occupation and so also you have better performances.

I never used table before stats, but it's a forma mentis of mine and I didn't saw the search you mentioned.

About fields I used it before stats only when an event has many fields otherwise I didn't find a great advantage in using.

And when I have many events I prefer to use DataModels or summary indexes.

Ciao.

Giuseppe

bowesmana · ‎10-25-2022

It would make sense that stats on the IDX will do something similar to "sort" on the IDX, where it will presort its own results before sending to the SH.

I would therefore expect stats on the IDX to perhaps run pre-stats of its own data before returning the split by fields to the SH and therefore "fields" would NEVER be necessary before stats.

... but ... would be nice to get a definitive answer

gcusello · ‎10-25-2022

Hi @bowesmana,

I agree with you, only someone from Splunk can answer to your question.

Ciao.

Giuseppe

bowesmana · ‎10-25-2022

@gcusello

Did some tests and looking at Job inspector phase0 for litsearch, it tells what is going one

so with the basic search

index=x
| table rulename
| stats count by rulename

Job inspector reports

litsearch index=x | ifields + rulename | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "rulename" | prestats count by rulename

replacing table with fields gives

litsearch index=x | fields + rulename | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "rulename" | prestats count by rulename

and with neither, you get

litsearch index=x | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "rulename" | prestats count by rulename

so, very minor differences, but all doing prestats and returning only the restricted field list.

gcusello · ‎10-25-2022

hi @bowesmana,

what'd the different job time in the different searches?

As I said the main difference I found is between fields and neither when there are many events and many fields, otherwise I found little differences and I usually don't use it.

Ciao.

Giuseppe

bowesmana · ‎10-25-2022

job time variation was insignificant - but I'm not testing it on a large data set or with index clustering - I'll do that at some point

Performance fields/stats/table

fields

stats

Splunk MCP & Agentic AI: Machine Data Without Limits

Finding Based Detections General Availability

Get Your Hands Dirty (and Your Shoes Comfy): The Splunk Experience

Join the Conversation

Performance fields/stats/table

fields

stats

Splunk MCP & Agentic AI: Machine Data Without Limits

Finding Based Detections General Availability

Get Your Hands Dirty (and Your Shoes Comfy): The Splunk Experience