Re: Can you help me with my join query?

seomisp · ‎09-25-2018

I'm having trouble with a join query. It doesn't work with the inner or left join, although I can see the event from the left join, but without the fields from the other source.

Let's say IDX_A contains an url and srcip. I want to join the srcip from IDX_A with the dest_ip from IDX_B. Here's a sample of the join query I'm trying to do:

 index=IDX_A url = "http://some.url"
 | rename srcip as dest_ip
 | join dest_ip [search index=IDX_B]

I can confirm that the main query and the subsearch, returns results when executed separately, however, the join returns 0 results.

Any idea why this doesn't work? Seems pretty straight forward to me, but I can't get it to work.

I saw some similar previous posts, but none of them were helpful for me.

Vijeta · ‎09-26-2018

You can try using map command -

index=IDX_A | rename srcip as dest_ip| map search="search index=IDX_B dest_ip=$dest_ip$"

seomisp · ‎09-26-2018

The problem with map is that I lose the fields from the main search. The was the point of having a join.

Vijeta · ‎09-26-2018

You can refer the fields you need from the main search in map search, for example to retain url

index=IDX_A | rename srcip as dest_ip| map search="search index=IDX_B dest_ip=$dest_ip$"| eval url=\"$url$"\ "

Also you can try using map on IDX_A and main index as IDX_B, so that all fields from IDX_A are retained without using eval.

seomisp · ‎09-26-2018

I get some weird results from map. Most of the times, I get:

[map]: Search Processor: Subsearch produced 10000 results, truncating to maxout 10000.

Which will then dump a bunch of results from indexes that are not even specified in the query.

Vijeta · ‎09-27-2018

This should be the search-
index=IDX_A | rename srcip as dest_ip| map maxsearches=50000 search="search index=IDX_B dest_ip=$dest_ip$| eval url=\"$url$"\ "

By default map returns 10000 results, you can increase the limit by adding maxsearches=

Also if dest_ip needs to be escaped try with dest_ip= \"$dest_ip$"\

seomisp · ‎09-27-2018

I never used the map command much and seems very unstable. I increased the maxsearches but, still getting random results, sometimes it says it can't find the url attribute.

I still don't understand why the join command doesn't work, it looks like a much cleaner solution.

Vijeta · ‎09-27-2018

Agreed join is much appropriate in this context and it should work. Can you try using all time with the join. Also did you check the timestamp(_time) value for both the indexes . Logically I dont see any reason for join to not work unless there is some parsing or timestamp issue

seomisp · ‎09-27-2018

I tried with All Time. Same result, which is no match. I checked the _time fields of both indexes and the only difference is IDX_B has milliseconds, where IDX_A doesn't.

One thing that doesn't make sense to me is that, if I put the url filter on the subsearch, the join works. However, I want to make this query generic and join all events from IDX_A with the matching ones from IDX_B. I'm just using the url filter now to test.

seomisp · ‎09-27-2018

I think the problem is because the subsearch exceeds the 50000 value:

[subsearch]: Search Processor: Subsearch produced 50000 results, truncating to maxout 50000.

That's when filtering it with the url it works, because it returns less events.

Is there a way to workaround this?

Vijeta · ‎09-26-2018

Try this :-

index=IDX_A url = "http://some.url" [search index=IDX_B| rename dest_ip as scrip| return scrip]

seomisp · ‎09-26-2018

It doesn't work. It also doesn't make sense to rename dest_ip to srcip as it's the srcip from IDX_B that I want to get.
The join query works if I add the an extra condition, like for example the URL's site to the subsearch:

index=IDX_A url = "http://some.url"
| rename srcip as dest_ip
| join dest_ip [search index=IDX_B site=some.url]

But it only works if the site is hardcoded. If I pass for example a rex field, it doesn't work:

    index=IDX_A url = "http://some.url"
    | rex field=url "^\w+:\/\/(?<rex_site>\S+?)\/(?<uri_path>.+)$"
    | rename srcip as dest_ip
    | join dest_ip [search index=IDX_B site=rex_site]

As the idea is to have this automated, the hardcoded solution is not optimal.

seomisp · ‎09-26-2018

Dawson014, the max=0 didn't work.

Vijeta,

Here are two samples:

For IDX_A:

 08/07/2018 14:45:01 -0700, search_now=1533678300.000, info_search_time=1533678301.032, et="08/07/18 12:17:00", url="http://some.url", date_first="2018-07-07T18:27:55.000Z", srcip="34.196.13.28", _time=1533669420, count=1, ts_date_last="2018-07-07T18:44:34.000Z", ts_type=url

For IDX_B:

{"endtime":"2018-08-07T19:17:34.675657Z","timestamp":"2018-08-07T19:17:34.668258Z","bytes":1657,"bytes_in":442,"bytes_out":1215,"dest_ip":"34.196.13.28","dest_port":80,"http_comment":"HTTP/1.1 200 OK","http_content_length":897,"http_content_type":"text/html","http_method":"GET","src_ip":"10.230.36.142","status":200,"time_taken":9709,"transport":"tcp"}

I removed some fields due to privacy concerns, but the important ones are there. I'm trying to join the event.ts_srcip with the dest_ip. As you can see the value is the same "34.196.13.28". I'm doing the query using a time range for the full day of 7th August 2018. Not sure if it matter the IDX_A raw event is plain text and IDX_B is json. I can still query both using the named fields, so they're being well parsed.

Dawson014 · ‎09-26-2018

Hi,

Have you tried using max -

| join dest_ip max=0 [<subsearch>]

Let me know if it helps.

Vijeta · ‎09-26-2018

Are there common destination ip between the two index, can you share some sample data?

Can you help me with my join query?

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?