Solved: Re: Subsearch for multiple sourcetypes and fieldna...

DerBastler · ‎01-08-2018

I need to do a search in two different sourcetypes and use the result to do additional searches in these queries.
But I have the problem that, while both sourcetypes have similar values, they use different prefixes. So in sourcetype=A the ip is called aIP and in sourcetype=B the ip is called bIP respectively.

So you could search with
aIP=”192.168.0.1” OR bIp=”192.168.0.1”

However if you want to use these ips from a subsearch of both of these tables it becomes problematic and I am not sure what the best solution is.
So let’s assume I want to find the IPs used on a specific page called “MAINPAGE”. And use these IPs to search for other pages visited by them on both sources.
I tried to minimize the code as far as possible. It might not make any sense anymore, but I hope it’s enough to bring across my point.

Index=web (sourcetype=a OR sourcetype=b)
[search index=web sourcetype=a apage=MAINPAGE | table aIP]

| eval page = coalesce (apage, bpage)
| eval ip = coalesce (aIP, bIP)

| table page, ip

So because the table of the subsearch is automatically the search parameters for the parent searches and I need to search for both results, I don’t see a better way than doing both searches twice and just rename the field name of the output table.

Is there any way to reduce it to two subsearches in this case? Eg. renaming the fields without doing the search an additional time?

miteshvohra · ‎01-17-2018

Have you tried creating 'Field Alias' for both aIP and bIP?

If not, create one as src_ip and try this:

index=web (sourcetype=a OR sourcetype=b) | table page, src_ip

HTH, Mitesh.

View solution in original post

miteshvohra · ‎01-17-2018

Have you tried creating 'Field Alias' for both aIP and bIP?

If not, create one as src_ip and try this:

index=web (sourcetype=a OR sourcetype=b) | table page, src_ip

HTH, Mitesh.

DerBastler · ‎01-23-2018

Thank you! This works perfectly!

micahkemp · ‎01-09-2018

It sounds like you may be running into limitations of subsearch. Subsearches finalize after 60 seconds (or 10k rows, though my previous answer worked around that specific limitation), so if your inside search takes too long to run you will get potentially incomplete results.

Instead it's recommended you find a non-subsearch means to fetch your needed data, potentially like this:

index=_internal sourcetype=a OR sourcetype=b
| eval ip=if(sourcetype="a", aIP, bIP)
| stats values(apage) AS apage values(otherfield) AS otherfield BY ip
| search apage=MAINPAGE

DerBastler · ‎01-09-2018

Unfortunately that's not possible in my case. The initial data is too big. So the idea was to have rather "specific" subsearches, just leaving some thousand search values for the main search. Using the main search, grab everything and filter later on will use up too much data that the job gets stuck (and times out).

micahkemp · ‎01-08-2018

Try:

index=web (sourcetype=a OR sourcetype=b)
[search index=web sourcetype=a apage=MAINPAGE | stats values(aIP) AS aIP | eval search="aIP=".mvjoin(aIP, " OR aIP=")." OR bIP=".mvjoin(aIP, " OR bIP=")]

When a subsearch returns a field named search, that field's value will be dropped inline into your search. This results in a search that looks like:

index=web (sourcetype=a OR sourcetype=b) aIP=1.2.3.4 OR aIP=4.3.2.1 OR bIP=1.2.3.4 OR bIP=4.3.2.1

Or maybe:

index=web
[search index=web sourcetype=a apage=MAINPAGE | stats values(aIP) AS aIP | eval search="(sourcetype=a (aIP=".mvjoin(aIP, " OR aIP=").")) OR (sourcetype=b (bIP=".mvjoin(aIP, " OR bIP=")."))"]

Which results in a search like:

index=web (sourcetype=a (aIP=1.2.3.4 OR aIP=4.3.2.1)) OR (sourcetype=b (bIP=1.2.3.4 OR bIP=4.3.2.1))

Which might be a tad more efficient.

DerBastler · ‎01-09-2018

I just ran it, but it doesn't work.
I tried the first and simpler variation, and fiddled around with it. Maybe there is some limit on how much can be in any field or something similar.
It works for testing purpose if you insert an | eval aIP=specific single entry from valid result into it.
But otherwise I guess that the following commands break down. Maybe there is some kind of limitation as the search "field" would contain thousands of numbers, likely almost a million characters.

micahkemp · ‎01-09-2018

How long do these searches take to run:

index=web sourcetype=a apage=MAINPAGE | stats values(aIP) AS aIP | eval search="aIP=".mvjoin(aIP, " OR aIP=")." OR bIP=".mvjoin(aIP, " OR bIP=")

index=web sourcetype=a apage=MAINPAGE | stats values(aIP) AS aIP | eval search="(sourcetype=a (aIP=".mvjoin(aIP, " OR aIP=").")) OR (sourcetype=b (bIP=".mvjoin(aIP, " OR bIP=")."))"

DerBastler · ‎01-10-2018

We moved away from tackling the problem from this end now. I appreciate your help though.

DerBastler · ‎01-08-2018

Thanks! Can't say right now if it works or not, but that's definitely something I didn't come across and didn't test yet. Will try it asap!

cmerriman · ‎01-08-2018

why not something like this:

Index=web (sourcetype=a OR sourcetype=b)
[search index=web sourcetype=a apage=MAINPAGE | eval bIP=aIP|table aIP bIP|format]
OR
[search index=web sourcetype=b bpage=MAINPAGE | eval aIP=bIP| table aIP bIP|format]
| eval page = coalesce (apage, bpage)
| eval ip = coalesce (aIP, bIP)
| table page, ip

DerBastler · ‎01-08-2018

The problem in this case is, that both of the results are passed on with an AND.
So if a subsearch returns
|table aIP bIP
that means that the main search now searches for aIP AND bIP.
What I need in this case is an OR though as some of these fields might be null.

cmerriman · ‎01-09-2018

what about adding this after format

|rex mode=sed field=search "s/AND bIP/OR bIP/g"

DerBastler · ‎01-09-2018

Actually the "format" itself already breaks the search. While the splunk documentations states that it's implicitly used (https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Search/Changetheformatofsubsearchresults)
It's actually not. Well, the functionality is. But the search isn't generated the same way. It generates a big step in between which makes a 5 minute search into a 20 minute search, which then leads then to breaking subsearches.
But I guess you can only use |rex if the format statement was used before. But as this breaks the search...
(can be verified by increasing search time span. While it works with tiny timespans, at some point it takes so long that splunk starts to ignore subsearches)

Subsearch for multiple sourcetypes and fieldnames

Mastering Data Pipelines: Unlocking Value with Splunk

The Latest Cisco Integrations With Splunk Platform!

AI Adoption Hub Launch | Curated Resources to Get Started with AI in Splunk