Splunk Search

Can you help me with my join query?

seomisp
Explorer

I'm having trouble with a join query. It doesn't work with the inner or left join, although I can see the event from the left join, but without the fields from the other source.

Let's say IDX_A contains an url and srcip. I want to join the srcip from IDX_A with the dest_ip from IDX_B. Here's a sample of the join query I'm trying to do:

 index=IDX_A url = "http://some.url"
 | rename srcip as dest_ip
 | join dest_ip [search index=IDX_B]

I can confirm that the main query and the subsearch, returns results when executed separately, however, the join returns 0 results.

Any idea why this doesn't work? Seems pretty straight forward to me, but I can't get it to work.

I saw some similar previous posts, but none of them were helpful for me.

Tags (2)

Vijeta
Influencer

You can try using map command -

index=IDX_A | rename srcip as dest_ip| map search="search index=IDX_B dest_ip=$dest_ip$"

0 Karma

seomisp
Explorer

The problem with map is that I lose the fields from the main search. The was the point of having a join.

0 Karma

Vijeta
Influencer

You can refer the fields you need from the main search in map search, for example to retain url

index=IDX_A | rename srcip as dest_ip| map search="search index=IDX_B dest_ip=$dest_ip$"| eval url=\"$url$"\ "

Also you can try using map on IDX_A and main index as IDX_B, so that all fields from IDX_A are retained without using eval.

0 Karma

seomisp
Explorer

I get some weird results from map. Most of the times, I get:

[map]: Search Processor: Subsearch produced 10000 results, truncating to maxout 10000.

Which will then dump a bunch of results from indexes that are not even specified in the query.

0 Karma

Vijeta
Influencer

This should be the search-
index=IDX_A | rename srcip as dest_ip| map maxsearches=50000 search="search index=IDX_B dest_ip=$dest_ip$| eval url=\"$url$"\ "

By default map returns 10000 results, you can increase the limit by adding maxsearches=

Also if dest_ip needs to be escaped try with dest_ip= \"$dest_ip$"\

0 Karma

seomisp
Explorer

I never used the map command much and seems very unstable. I increased the maxsearches but, still getting random results, sometimes it says it can't find the url attribute.

I still don't understand why the join command doesn't work, it looks like a much cleaner solution.

0 Karma

Vijeta
Influencer

Agreed join is much appropriate in this context and it should work. Can you try using all time with the join. Also did you check the timestamp(_time) value for both the indexes . Logically I dont see any reason for join to not work unless there is some parsing or timestamp issue

0 Karma

seomisp
Explorer

I tried with All Time. Same result, which is no match. I checked the _time fields of both indexes and the only difference is IDX_B has milliseconds, where IDX_A doesn't.

One thing that doesn't make sense to me is that, if I put the url filter on the subsearch, the join works. However, I want to make this query generic and join all events from IDX_A with the matching ones from IDX_B. I'm just using the url filter now to test.

0 Karma

seomisp
Explorer

I think the problem is because the subsearch exceeds the 50000 value:

[subsearch]: Search Processor: Subsearch produced 50000 results, truncating to maxout 50000.

That's when filtering it with the url it works, because it returns less events.

Is there a way to workaround this?

0 Karma

Vijeta
Influencer

Try this :-

index=IDX_A url = "http://some.url" [search index=IDX_B| rename dest_ip as scrip| return scrip]

0 Karma

seomisp
Explorer

It doesn't work. It also doesn't make sense to rename dest_ip to srcip as it's the srcip from IDX_B that I want to get.
The join query works if I add the an extra condition, like for example the URL's site to the subsearch:

index=IDX_A url = "http://some.url"
| rename srcip as dest_ip
| join dest_ip [search index=IDX_B site=some.url]

But it only works if the site is hardcoded. If I pass for example a rex field, it doesn't work:

    index=IDX_A url = "http://some.url"
    | rex field=url "^\w+:\/\/(?<rex_site>\S+?)\/(?<uri_path>.+)$"
    | rename srcip as dest_ip
    | join dest_ip [search index=IDX_B site=rex_site]

As the idea is to have this automated, the hardcoded solution is not optimal.

0 Karma

seomisp
Explorer

Dawson014, the max=0 didn't work.

Vijeta,

Here are two samples:

For IDX_A:

 08/07/2018 14:45:01 -0700, search_now=1533678300.000, info_search_time=1533678301.032, et="08/07/18 12:17:00", url="http://some.url", date_first="2018-07-07T18:27:55.000Z", srcip="34.196.13.28", _time=1533669420, count=1, ts_date_last="2018-07-07T18:44:34.000Z", ts_type=url

For IDX_B:

{"endtime":"2018-08-07T19:17:34.675657Z","timestamp":"2018-08-07T19:17:34.668258Z","bytes":1657,"bytes_in":442,"bytes_out":1215,"dest_ip":"34.196.13.28","dest_port":80,"http_comment":"HTTP/1.1 200 OK","http_content_length":897,"http_content_type":"text/html","http_method":"GET","src_ip":"10.230.36.142","status":200,"time_taken":9709,"transport":"tcp"}

I removed some fields due to privacy concerns, but the important ones are there. I'm trying to join the event.ts_srcip with the dest_ip. As you can see the value is the same "34.196.13.28". I'm doing the query using a time range for the full day of 7th August 2018. Not sure if it matter the IDX_A raw event is plain text and IDX_B is json. I can still query both using the named fields, so they're being well parsed.

0 Karma

Dawson014
Path Finder

Hi,

Have you tried using max -

| join dest_ip max=0 [<subsearch>]

Let me know if it helps.

Vijeta
Influencer

Are there common destination ip between the two index, can you share some sample data?

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...