OK, so it looks like the main issue is probably values(*) as *, which is taking all values for all fields for each guid, so as it seems you have JSON data containing arrays, there would appear to be ...
See more...
OK, so it looks like the main issue is probably values(*) as *, which is taking all values for all fields for each guid, so as it seems you have JSON data containing arrays, there would appear to be more data than just guid/resourceId/sourcenumber. As for cardinality, what you are doing with stats values(*) as * vs @dtburrows3 version, which is ONLY returning 2 collected fields, is that every field in your data is being collected. If there is a 1:1 ratio of events to guid, then cardinality is high and you will effectively be returning EVERY single piece of the 10M events to the search head before it can then do the stats count by sourcenumber. If there are 20 events per guid, then you will get a reduced event count sent to the SH i.e. a lower cardinality., but with potentially 20 values per multivalue field. So, with this statement, you are returning 3 discrete bits of info | stats max(eval(if(disposition=="TERMINATED", 1, 0))) as guid_terminated,
values(sourcenumber) as sourcenumber
by guid guid guid_terminated = 0 or 1 depending whether that guid was terminated sourcenumber - the values of sourcenumber Indexed extractions https://docs.splunk.com/Documentation/SplunkCloud/9.1.2308/Data/Aboutindexedfieldextraction Tstats https://docs.splunk.com/Documentation/Splunk/9.1.1/SearchReference/tstats A good document on how tstats/TERM/PREFIX can massively improve searches, but for JSON data it will not generally help unless indexed extractions are being made. https://conf.splunk.com/files/2020/slides/PLA1089C.pdf