Splunk Search

Can you help me with my join query?

seomisp
Explorer

I'm having trouble with a join query. It doesn't work with the inner or left join, although I can see the event from the left join, but without the fields from the other source.

Let's say IDX_A contains an url and srcip. I want to join the srcip from IDX_A with the dest_ip from IDX_B. Here's a sample of the join query I'm trying to do:

 index=IDX_A url = "http://some.url"
 | rename srcip as dest_ip
 | join dest_ip [search index=IDX_B]

I can confirm that the main query and the subsearch, returns results when executed separately, however, the join returns 0 results.

Any idea why this doesn't work? Seems pretty straight forward to me, but I can't get it to work.

I saw some similar previous posts, but none of them were helpful for me.

Tags (2)

Vijeta
Influencer

You can try using map command -

index=IDX_A | rename srcip as dest_ip| map search="search index=IDX_B dest_ip=$dest_ip$"

0 Karma

seomisp
Explorer

The problem with map is that I lose the fields from the main search. The was the point of having a join.

0 Karma

Vijeta
Influencer

You can refer the fields you need from the main search in map search, for example to retain url

index=IDX_A | rename srcip as dest_ip| map search="search index=IDX_B dest_ip=$dest_ip$"| eval url=\"$url$"\ "

Also you can try using map on IDX_A and main index as IDX_B, so that all fields from IDX_A are retained without using eval.

0 Karma

seomisp
Explorer

I get some weird results from map. Most of the times, I get:

[map]: Search Processor: Subsearch produced 10000 results, truncating to maxout 10000.

Which will then dump a bunch of results from indexes that are not even specified in the query.

0 Karma

Vijeta
Influencer

This should be the search-
index=IDX_A | rename srcip as dest_ip| map maxsearches=50000 search="search index=IDX_B dest_ip=$dest_ip$| eval url=\"$url$"\ "

By default map returns 10000 results, you can increase the limit by adding maxsearches=

Also if dest_ip needs to be escaped try with dest_ip= \"$dest_ip$"\

0 Karma

seomisp
Explorer

I never used the map command much and seems very unstable. I increased the maxsearches but, still getting random results, sometimes it says it can't find the url attribute.

I still don't understand why the join command doesn't work, it looks like a much cleaner solution.

0 Karma

Vijeta
Influencer

Agreed join is much appropriate in this context and it should work. Can you try using all time with the join. Also did you check the timestamp(_time) value for both the indexes . Logically I dont see any reason for join to not work unless there is some parsing or timestamp issue

0 Karma

seomisp
Explorer

I tried with All Time. Same result, which is no match. I checked the _time fields of both indexes and the only difference is IDX_B has milliseconds, where IDX_A doesn't.

One thing that doesn't make sense to me is that, if I put the url filter on the subsearch, the join works. However, I want to make this query generic and join all events from IDX_A with the matching ones from IDX_B. I'm just using the url filter now to test.

0 Karma

seomisp
Explorer

I think the problem is because the subsearch exceeds the 50000 value:

[subsearch]: Search Processor: Subsearch produced 50000 results, truncating to maxout 50000.

That's when filtering it with the url it works, because it returns less events.

Is there a way to workaround this?

0 Karma

Vijeta
Influencer

Try this :-

index=IDX_A url = "http://some.url" [search index=IDX_B| rename dest_ip as scrip| return scrip]

0 Karma

seomisp
Explorer

It doesn't work. It also doesn't make sense to rename dest_ip to srcip as it's the srcip from IDX_B that I want to get.
The join query works if I add the an extra condition, like for example the URL's site to the subsearch:

index=IDX_A url = "http://some.url"
| rename srcip as dest_ip
| join dest_ip [search index=IDX_B site=some.url]

But it only works if the site is hardcoded. If I pass for example a rex field, it doesn't work:

    index=IDX_A url = "http://some.url"
    | rex field=url "^\w+:\/\/(?<rex_site>\S+?)\/(?<uri_path>.+)$"
    | rename srcip as dest_ip
    | join dest_ip [search index=IDX_B site=rex_site]

As the idea is to have this automated, the hardcoded solution is not optimal.

0 Karma

seomisp
Explorer

Dawson014, the max=0 didn't work.

Vijeta,

Here are two samples:

For IDX_A:

 08/07/2018 14:45:01 -0700, search_now=1533678300.000, info_search_time=1533678301.032, et="08/07/18 12:17:00", url="http://some.url", date_first="2018-07-07T18:27:55.000Z", srcip="34.196.13.28", _time=1533669420, count=1, ts_date_last="2018-07-07T18:44:34.000Z", ts_type=url

For IDX_B:

{"endtime":"2018-08-07T19:17:34.675657Z","timestamp":"2018-08-07T19:17:34.668258Z","bytes":1657,"bytes_in":442,"bytes_out":1215,"dest_ip":"34.196.13.28","dest_port":80,"http_comment":"HTTP/1.1 200 OK","http_content_length":897,"http_content_type":"text/html","http_method":"GET","src_ip":"10.230.36.142","status":200,"time_taken":9709,"transport":"tcp"}

I removed some fields due to privacy concerns, but the important ones are there. I'm trying to join the event.ts_srcip with the dest_ip. As you can see the value is the same "34.196.13.28". I'm doing the query using a time range for the full day of 7th August 2018. Not sure if it matter the IDX_A raw event is plain text and IDX_B is json. I can still query both using the named fields, so they're being well parsed.

0 Karma

Dawson014
Path Finder

Hi,

Have you tried using max -

| join dest_ip max=0 [<subsearch>]

Let me know if it helps.

Vijeta
Influencer

Are there common destination ip between the two index, can you share some sample data?

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...