I need to do a search in two different sourcetypes and use the result to do additional searches in these queries.
But I have the problem that, while both sourcetypes have similar values, they use different prefixes. So in sourcetype=A the ip is called aIP and in sourcetype=B the ip is called bIP respectively.
So you could search with
aIP=”192.168.0.1” OR bIp=”192.168.0.1”
However if you want to use these ips from a subsearch of both of these tables it becomes problematic and I am not sure what the best solution is.
So let’s assume I want to find the IPs used on a specific page called “MAINPAGE”. And use these IPs to search for other pages visited by them on both sources.
I tried to minimize the code as far as possible. It might not make any sense anymore, but I hope it’s enough to bring across my point.
Index=web (sourcetype=a OR sourcetype=b)
[search index=web sourcetype=a apage=MAINPAGE | table aIP]
OR
[search index=web sourcetype=a apage=MAINPAGE | rename aIP as bIP | table bIP]
OR
[search index=web sourcetype=b bpage=MAINPAGE | table bIP]
OR
[search index=web sourcetype=b bpage=MAINPAGE | rename bIP as aIP | table aIP]
| eval page = coalesce (apage, bpage)
| eval ip = coalesce (aIP, bIP)
| table page, ip
So because the table of the subsearch is automatically the search parameters for the parent searches and I need to search for both results, I don’t see a better way than doing both searches twice and just rename the field name of the output table.
Is there any way to reduce it to two subsearches in this case? Eg. renaming the fields without doing the search an additional time?
Have you tried creating 'Field Alias' for both aIP
and bIP
?
If not, create one as src_ip
and try this:
index=web (sourcetype=a OR sourcetype=b) | table page, src_ip
HTH, Mitesh.
Have you tried creating 'Field Alias' for both aIP
and bIP
?
If not, create one as src_ip
and try this:
index=web (sourcetype=a OR sourcetype=b) | table page, src_ip
HTH, Mitesh.
Thank you! This works perfectly!
It sounds like you may be running into limitations of subsearch. Subsearches finalize after 60 seconds (or 10k rows, though my previous answer worked around that specific limitation), so if your inside search takes too long to run you will get potentially incomplete results.
Instead it's recommended you find a non-subsearch means to fetch your needed data, potentially like this:
index=_internal sourcetype=a OR sourcetype=b
| eval ip=if(sourcetype="a", aIP, bIP)
| stats values(apage) AS apage values(otherfield) AS otherfield BY ip
| search apage=MAINPAGE
Unfortunately that's not possible in my case. The initial data is too big. So the idea was to have rather "specific" subsearches, just leaving some thousand search values for the main search. Using the main search, grab everything and filter later on will use up too much data that the job gets stuck (and times out).
Try:
index=web (sourcetype=a OR sourcetype=b)
[search index=web sourcetype=a apage=MAINPAGE | stats values(aIP) AS aIP | eval search="aIP=".mvjoin(aIP, " OR aIP=")." OR bIP=".mvjoin(aIP, " OR bIP=")]
When a subsearch returns a field named search
, that field's value will be dropped inline into your search. This results in a search that looks like:
index=web (sourcetype=a OR sourcetype=b) aIP=1.2.3.4 OR aIP=4.3.2.1 OR bIP=1.2.3.4 OR bIP=4.3.2.1
Or maybe:
index=web
[search index=web sourcetype=a apage=MAINPAGE | stats values(aIP) AS aIP | eval search="(sourcetype=a (aIP=".mvjoin(aIP, " OR aIP=").")) OR (sourcetype=b (bIP=".mvjoin(aIP, " OR bIP=")."))"]
Which results in a search like:
index=web (sourcetype=a (aIP=1.2.3.4 OR aIP=4.3.2.1)) OR (sourcetype=b (bIP=1.2.3.4 OR bIP=4.3.2.1))
Which might be a tad more efficient.
I just ran it, but it doesn't work.
I tried the first and simpler variation, and fiddled around with it. Maybe there is some limit on how much can be in any field or something similar.
It works for testing purpose if you insert an | eval aIP=specific single entry from valid result into it.
But otherwise I guess that the following commands break down. Maybe there is some kind of limitation as the search "field" would contain thousands of numbers, likely almost a million characters.
How long do these searches take to run:
index=web sourcetype=a apage=MAINPAGE | stats values(aIP) AS aIP | eval search="aIP=".mvjoin(aIP, " OR aIP=")." OR bIP=".mvjoin(aIP, " OR bIP=")
index=web sourcetype=a apage=MAINPAGE | stats values(aIP) AS aIP | eval search="(sourcetype=a (aIP=".mvjoin(aIP, " OR aIP=").")) OR (sourcetype=b (bIP=".mvjoin(aIP, " OR bIP=")."))"
We moved away from tackling the problem from this end now. I appreciate your help though.
Thanks! Can't say right now if it works or not, but that's definitely something I didn't come across and didn't test yet. Will try it asap!
why not something like this:
Index=web (sourcetype=a OR sourcetype=b)
[search index=web sourcetype=a apage=MAINPAGE | eval bIP=aIP|table aIP bIP|format]
OR
[search index=web sourcetype=b bpage=MAINPAGE | eval aIP=bIP| table aIP bIP|format]
| eval page = coalesce (apage, bpage)
| eval ip = coalesce (aIP, bIP)
| table page, ip
The problem in this case is, that both of the results are passed on with an AND.
So if a subsearch returns
|table aIP bIP
that means that the main search now searches for aIP AND bIP.
What I need in this case is an OR though as some of these fields might be null.
what about adding this after format
|rex mode=sed field=search "s/AND bIP/OR bIP/g"
Actually the "format" itself already breaks the search. While the splunk documentations states that it's implicitly used (https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Search/Changetheformatofsubsearchresults)
It's actually not. Well, the functionality is. But the search isn't generated the same way. It generates a big step in between which makes a 5 minute search into a 20 minute search, which then leads then to breaking subsearches.
But I guess you can only use |rex if the format statement was used before. But as this breaks the search...
(can be verified by increasing search time span. While it works with tiny timespans, at some point it takes so long that splunk starts to ignore subsearches)