I am trying to run a query like below but I am limited to 10000 sub search result. Is there a way to make this query run for more than 10000 sub search result.
search index="sample_index" "Kubernetes.namespace"="ABC" "Two String" [index="sample_index" "Kubernetes.namespace"="ABC" "Success work done" | fields demo_id ] | stats count as Result by marksObtained
I saw someone has already asked a similar question here, and I tried implementing it in the same way, but it's not working for me. Below is the query which I wrote, but results are not as expected.
index="sample_index" "Kubernetes.namespace"="ABC" ("Two String" OR "Success work done") | stats count as Result by marksObtained
Hi @Vivekmishra01,
you could configure a different limit for subsearches (by default 50,000) but it isn't a best practice, but anyway you could filter your results using the common field, something like this:
search index="sample_index" "Kubernetes.namespace"="ABC" ("Two String" OR "Success work done")
| eval kind=if(search_match("Two String"),"Two String","Success work done")
| stats dc(kind) AS kind_count values(marksObtained) AS marksObtained BY demo_id
| where kind_count=2
| mvexpand marksObtained
| stats count AS Result BY marksObtained
Ciao.
Giuseppe
Hi @Vivekmishra01,
you could configure a different limit for subsearches (by default 50,000) but it isn't a best practice, but anyway you could filter your results using the common field, something like this:
search index="sample_index" "Kubernetes.namespace"="ABC" ("Two String" OR "Success work done")
| eval kind=if(search_match("Two String"),"Two String","Success work done")
| stats dc(kind) AS kind_count values(marksObtained) AS marksObtained BY demo_id
| where kind_count=2
| mvexpand marksObtained
| stats count AS Result BY marksObtained
Ciao.
Giuseppe
@gcusello It worked for me for up to last 48 hours. But as I am increasing the time I see some inconsistencies in data. I believe splunk logs are dropping or something like that. Can you explain me below why you did it like that.
stats dc(kind) AS kind_count values(marksObtained) AS marksObtained BY demo_id
| where kind_count=2
Hi @Vivekmishra01,
with the eval before the stats I defined events,
then in the stats I categorized events using the variable in eval.
using the where condition I take only the events with both the events.
Maybe there's some inconsistence because there's one of the two kind of events outside the time period, but they should be very few.
Ciao.
Giuseppe
Hi @Vivekmishra01
Both events must contain the marksObtained field must be in both events for the stats command group by to work.
If you provide examples of both types of event data ("Two String" OR "Success work done") then we might be able to assist in getting this working for you.
Please obfuscate any sensitive data.
@yeahnah The inner subquery don't have "marksObtained" but both the query has common field demo_id
@yeahnah
Outer query result will be like below and this is demo_id="64236fa4c43595ajj4eudhjjsh344,0ohf430765235178"
{"log":"2023-03-28 22:52:20.504 INFO [my-application-web,64236fa4c43595ajj4eudhjjsh344,0ohf430765235178] 1 --- [nio-1892-exec-4] j.c.o.m.t.c.NotificationEventsController : Two Strings marksObtained=A, ,"Kubernetes.node":"sample-node","Kubernetes.pod":"sample-pod","Kubernetes.namespace":"ABC","hostname":"demo_name"}
Inner query Result
{"log":"2023-03-28 22:50:14.534 INFO [my-application-web,64236fa4c43595ajj4eudhjjsh344,0ohf430765235178] 1 --- [nio-1892-exec-4] c.j.c.o.m.t.s.AlertsKafkaProducer : Success work done","Kubernetes.node":"sample-node","Kubernetes.pod":"sample-pod","Kubernetes.namespace":"ABC","hostname":"demo_name"}
marksObtained will have only three value "A", "B" and "C"
OK, based on your sample data this should work...
index=dummy
| append [| makeresults
| eval data="{\"log\":\"2023-03-28 22:52:20.504 INFO [my-application-web,64236fa4c43595ajj4eudhjjsh344,0ohf430765235178] 1 --- [nio-1892-exec-4] j.c.o.m.t.c.NotificationEventsController : Two Strings marksObtained=A\",\"Kubernetes.node\":\"sample-node\",\"Kubernetes.pod\":\"sample-pod\",\"Kubernetes.namespace\":\"ABC\",\"hostname\":\"demo_name\"}|{\"log\":\"2023-03-28 22:50:14.534 INFO [my-application-web,64236fa4c43595ajj4eudhjjsh344,0ohf430765235178] 1 --- [nio-1892-exec-4] c.j.c.o.m.t.s.AlertsKafkaProducer : Success work done\",\"Kubernetes.node\":\"sample-node\",\"Kubernetes.pod\":\"sample-pod\",\"Kubernetes.namespace\":\"ABC\",\"hostname\":\"demo_name\"}"
| makemv data delim="|"
| mvexpand data ]
| rename data AS _raw
| tojson
| spath
``` ignore above, just used to create dummy events ```
| rex field=log ",(?<demo_id>[^\]]+)(.*=(?<marksObtained>\w+))*" ``` may not need this rex if field values already extracted ```
| stats count AS Result values(marksObtained) AS marksObtained BY demo_id
OK, if both events have the demo_id field that tie the events together, then that is what you should use as the group by "key". So, something like this should work...
index="sample_index" "Kubernetes.namespace"="ABC" ("Two String" OR "Success work done")
| stats count AS Result max(marksObtained) BY demo_id
Note, the max(marksObtained) assumes the the values is a number, not a a string. Use values(marksObtained) if it is a string value.
Hope that helps
I am trying to count number of "A", "B" and "C". So, I think it must be BY "marksObtained". demo_id will be more than 10000.
You will not hit the 10000 limit because you do not need to use the inefficient and limited subsearch to get your result.
And, to find the distinct count (dc) of "A", "B" and "C" just add this to the end of the query provided above
| stats dc(marksObtained) AS tally_marksObtained