Splunk Search

Why is search not working properly on duplicate inner records?

user9025
Path Finder

I have a splunk query, in which my intention is to get all ipAddress for which "EVENT A" occurred in last 22 hours starting from 4 hours before,  but "EVENT B" is not there in last 24 hours for same IpAddress.

It is known that "Event A" will have one occurrence for Ip address,(if any), but "Event B" will have ,multiple occurrences.

Following is the query:

 

 

index=prod-* sourcetype="kube:service" "Event A"  earliest=-24h latest=-4h  |table IpAddress | search NOT [search index=prod-* sourcetype="kube:service" AND ("Event B")  earliest=-24h latest=-0h |table IpAddress ]

 

 

Why the first query is not working fine?

This does not work fine and return the results, even if, there is an Ip address for "Event A" and multiple events for same Ip address "Event B".

But if I add, dedup IpAddress to inner search not query, then it works fine.

Updated query:

 

 

index=prod-* sourcetype="kube:service" "Event A"  earliest=-24h latest=-4h  |table IpAddress | search NOT [search index=prod-* sourcetype="kube:service" AND ("Event B")  earliest=-24h latest=-0h |dedup IpAddress|table IpAddress ]

 

 

Labels (2)
Tags (2)
0 Karma
1 Solution

jdunlea
Contributor

If you have a lot of events with "EVENT B" in your data, then you might be hitting the event limit for the subsearch (10k events). Therefore the subsearch will return only the first 10k events, which might only have a small number of IP addresses (if many events have the same IP address).

 

Using dedup will make the result count much smaller and probably have less than 50k IP addresses, so the subsearch can return all of the IP addresses to the first search and then do the filtering. 

 

Side note: You might be able to do this using a single search (no subsearch) by doing something like the following (please note: you will need to create the event_flag field yourself using your own regex/match)

 

index=prod-* sourcetype="kube:service" ("Event A"  earliest=-24h latest=-4h) OR ("Event B" earliest=-24h latest=-0h)  | eval event_flag=if(match(_raw,"Event A"),"Event_A","Event_B")
| stats values(event_flag) as event_flag dc(event_flag) as event_count by IPAddress
| search event_count=1 event_flag="Event_A"

 

 

 

View solution in original post

jdunlea
Contributor

If you have a lot of events with "EVENT B" in your data, then you might be hitting the event limit for the subsearch (10k events). Therefore the subsearch will return only the first 10k events, which might only have a small number of IP addresses (if many events have the same IP address).

 

Using dedup will make the result count much smaller and probably have less than 50k IP addresses, so the subsearch can return all of the IP addresses to the first search and then do the filtering. 

 

Side note: You might be able to do this using a single search (no subsearch) by doing something like the following (please note: you will need to create the event_flag field yourself using your own regex/match)

 

index=prod-* sourcetype="kube:service" ("Event A"  earliest=-24h latest=-4h) OR ("Event B" earliest=-24h latest=-0h)  | eval event_flag=if(match(_raw,"Event A"),"Event_A","Event_B")
| stats values(event_flag) as event_flag dc(event_flag) as event_count by IPAddress
| search event_count=1 event_flag="Event_A"

 

 

 

johnhuang
Motivator

Subsearch have limitations including 10k results and 60 sec runtime. The dedup reduce the number of results to less than 10K.

Subsearch is also inefficient compared to other methods -- you should write a primary search that includes both event types and use stats, etc to filter. If you need help with this, you should provide the actual search terms/fields for Event A and B.

Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...