Splunk Search

Newbie: Splunk equivalent of NATURAL JOIN

Path Finder

I don't really know what to search for on here, but I can't seem to get the | (pipe operator) to work like UNIX.

I have two sets of records. Examples of each type:

A) April 18, session=12345 http_status=404
   April 19, session=78912 http_status=200
   April 20, session=45678 http_status=503

and...

B) April 18, session=12345 ipaddr=89.76.45.34
   April 19, session=78912 ipaddr=44.27.53.78
   April 20, session=45678 ipaddr=79.73.25.39

Here's what I'm trying to get my query to do: get all session ids where http_status >= 400, and then give me the result set of logs where session = x, and ipaddr exists.

Thank you! One example will open up a lot of doors for me with Splunk.

UPDATED: In SQL: SELECT * FROM "B" NATURAL JOIN "A"

Tags (3)
1 Solution

SplunkTrust
SplunkTrust

Let's say that the events in A have sourcetype="A", and the events in B have sourcetype="B". Maybe in your actual data it'll be source or host or something else but it doesnt matter.

1) If you're looking to stitch everything together on session ids, there are several ways to do this sort of thing. Here are two

(sourcetype=A) OR (sourcetype=B ipaddr=*) | transaction session | search http_status>=400

and sometimes stats is the best tool for the job, although here you need a little mvexpand and it's a bit more complicated:

(sourcetype=A ) OR (sourcetype=B ipaddr=*) | stats count first(ipaddr) as ipaddr values(http_status) as http_status by session | mvexpand http_status | search http_status>=400 | fields ipaddr

2) However depending on how sparse the http_status values over 400 are, you might want to take the approach of using a subsearch. The following will use a subsearch to get all the session id's that had errors, and then it searches on all of those session Id's against sourcetype B. Then finally we just pipe to top to get the top ipaddr values.

sourcetype=B ipaddr=* [sourcetype=A http_status>=400 | fields session] | top ipaddr

View solution in original post

SplunkTrust
SplunkTrust

Let's say that the events in A have sourcetype="A", and the events in B have sourcetype="B". Maybe in your actual data it'll be source or host or something else but it doesnt matter.

1) If you're looking to stitch everything together on session ids, there are several ways to do this sort of thing. Here are two

(sourcetype=A) OR (sourcetype=B ipaddr=*) | transaction session | search http_status>=400

and sometimes stats is the best tool for the job, although here you need a little mvexpand and it's a bit more complicated:

(sourcetype=A ) OR (sourcetype=B ipaddr=*) | stats count first(ipaddr) as ipaddr values(http_status) as http_status by session | mvexpand http_status | search http_status>=400 | fields ipaddr

2) However depending on how sparse the http_status values over 400 are, you might want to take the approach of using a subsearch. The following will use a subsearch to get all the session id's that had errors, and then it searches on all of those session Id's against sourcetype B. Then finally we just pipe to top to get the top ipaddr values.

sourcetype=B ipaddr=* [sourcetype=A http_status>=400 | fields session] | top ipaddr

View solution in original post

Path Finder

2 -- Stats worked the best 🙂

0 Karma

Path Finder

No problem. Thanks for that. I updated my question so it's clearer for anyone else.

0 Karma

SplunkTrust
SplunkTrust

I see. sorry I misinterpreted yr question. See updated answer.

0 Karma

Path Finder

because those are the two logs that have http_status >= 400 in log type A.

0 Karma

Path Finder

Let me explain better.

The above example should return the first and last logs from B:

April 18, session=12345 ipaddr=89.76.45.34

April 20, session=45678 ipaddr=79.73.25.39

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!