Hello,
I'm trying to write a query where I provide a key identifier (say "A"), and the query both finds matching results, but also uses a field from those results as a filter to another query that provides additional data that's needed. Obfuscating some things, this is the idea, and the closest I've gotten:
index=ind1 earliest=-1d field1=abc | append [search index=ind1 earliest=-1d "A" field1=xyz | rename field2 as f2] | where field2=f2 OR field1="xyz"
The idea is that results where field1=xyz and contain "A" have another field, "field2", that is present and has a matching value when field1=xyz or field1=abc. So I want to be able to search based "A" and get back results where field1=xyz or field1=abc where field2 matches between those 2 sets.
I do think a join would probably work here, but I've heard there can be performance issues with that so I was trying to avoid that. It seems to me that I can't use "where field2=f2", and it also seems the parent search is pulling in a lot of data because of the generally broad terms (I suppose because the piped where command is applied after the fact). Any ideas of how to write this performantly?
Replied in the wrong thread, ignore!
Try something along these lines
| eval keep_xyz=if(searchmatch("A") AND field1="xyz","true",null())
| eval keep_abc=if(field1="abc","true",null())
| eventstats values(keep_abc) as keep_abc values(keep_xyz) as keep_xyz by field2
| where keep_xyz="true" AND keep_abc="true"
That does seem like that would work as far as getting the results I want, though it leaves one of my issues unsolved. The parent query "index=ind1 earliest=-1d field1=abc" returns many, many results without the inclusion of some filter on field2. My initial approach (plus your fix for it) filter those results after that broad search is done which isn't great from a performance perspective. Perhaps I'm better off just using a join at that point, not sure.
Anyway, thanks for the reply 🙂
So the best performing solution I could come up with was something like this:
index=ind1 earliest=-1d field1=abc field2 IN
([search index=ind1 earliest=-1d "A" field1=xyz | stats count by field2 | fields field2 | rename field2 as query | format mvsep="" "" "" "" "" "" "" | replace "NOT ()" WITH ""]) |
append [search index=ind1 earliest=-1d "A" field1=xyz]
This way, the parent query is running with the additional filtering provided by the subquery. One thing I was wondering was whether the search results of "search index=ind1 earliest=-1d "A" field1=xyz" could be stored to not have to run it twice. Is that possible?