Greetings.
I'm trying to count all calls in this:
index="my_data" resourceId="sip*" "CONNECTED"
Where not in this:
index="my_data" resourceId="sip*" "ENDED"
This works when the latter is <10k (subsearch)
index="my_data" resourceId="sip*" "CONNECTED" NOT
[
search index="my_data" resourceId="sip*" "ENDED" | table guid
]
And I can use a join for more than >10k because the TOTAL is not 10k (join limits)
index="my_data" resourceId="sip*" "CONNECTED"
| table guid meta
| join type=left guid
[
search index="my_data" resourceId="sip*" "ENDED"
| table guid timestamp
]
| search NOT timestamp="*"
But neither 'feel' great.
I'm making my way through the PDF found here but not figured out 'the best' way to do this (if such a thing exists).
https://community.splunk.com/t5/Splunk-Search/how-to-exclude-the-subsearch-result-from-main-search/m...
So while there are several questions related to 'excluding subsearch' results, I have not found many that help with this 10k issue (subsearch results more than 10k and a join works, as long as my total values is less than 10k).
PLUS - joins are kinda sucky, amirite? I mean, that's like what the first things that Nick Mealy says in that pdf.
So just looking for more options to try and learn!
Thank you!
Hi @loganramirez ,
at first the limit of subsearch results is 50,000 not 10,000.
Anyway, you could use stats to correlate searches, something like this:
index="my_data" resourceId="sip*" ("CONNECTED" OR "ENDED")
| eval status=if(searchmatch("CONNECTED"),"CONNECTED","ENDED")
| stats
dc(status) AS status_count
values(status) AS status
values(meta) AS meta
last(timestamp) AS timestamp
BY guid
| where status_count=1 AND status="CONNECTED"
Ciao.
Giuseppe
Hi @loganramirez ,
at first the limit of subsearch results is 50,000 not 10,000.
Anyway, you could use stats to correlate searches, something like this:
index="my_data" resourceId="sip*" ("CONNECTED" OR "ENDED")
| eval status=if(searchmatch("CONNECTED"),"CONNECTED","ENDED")
| stats
dc(status) AS status_count
values(status) AS status
values(meta) AS meta
last(timestamp) AS timestamp
BY guid
| where status_count=1 AND status="CONNECTED"
Ciao.
Giuseppe
I see why you are an "Esteemed Legend," Sir!
I learned A LOT in this, like, for the first time how using 'stats' can be used to figure things out.
For those coming later:
index="policyguru_data" resourceId="sip*" ("CONNECTED" OR "ENDED")
This gets both sets of data I need in one pass.
| eval status=if(searchmatch("CONNECTED"),"CONNECTED","ENDED")
Then creates a new column (status) that just has "CONNECTED" or "ENDED" in it
| stats
dc(status) AS status_count. <-- distinct count, so since by guid, if it has both, it'll be 2. If it only has 1, it'll be 1.
values(status) AS status <-- display the values in a field called 'status' (I don't need this but it's nice to see/learn!)
values(meta) AS meta <-- same as above
last(timestamp) AS timestamp <-- capture the most recent timestamp (since there will be two for the matches)
BY guid
| where status_count=1 AND status="CONNECTED"
Then, use 'where' to keep the ones that only have "CONNECTED" (so they do NOT have an ENDED record, which is exactly what I want).
Thank you, Legend!
Hi @loganramirez,
it was a pleasure to help you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated 😉