Splunk Search
Highlighted

"Join" information from two log sources

Explorer

I have two sourcetypes with data as follows:

First sourcetype:

tx_id=1, event=error, extra=foo
tx_id=1, event=error, extra=bar
tx_id=2, event=info
tx_id=2, event=error, extra=baz
tx_id=3, event=info

Second sourcetype:

tx_id=1, url=/A
tx_id=2, url=/B
tx_id=3, url=/C

What I would like to get is a table with all error events from the first log, the particular extra info and the corresponding url from the second log. That is, (ideally) the result would be

extra=foo, url=/A
extra=bar, url=/A
extra=baz, url=/B

I know SQL fairly well and this would be a plain, simple "join". But I just cannot get this to work in Splunk with a | transaction (which will not give me the two rows in the tx_id=1 case), a | join or | stats.

As this must be really straightforward, I am probably missing the obvious and would appreciate any help or pointers.

Thanks a lot!

0 Karma
Highlighted

Re: "Join" information from two log sources

SplunkTrust
SplunkTrust

Hi

Can you try this search?

sourcetype=first_sourcetype event=error | dedup count by tx_id,extra
append [
    search sourcetype=second_sourcetype | dedup tx_id url
] | stats values(extra) as extra values(url) as url by tx_id

Here, I have used dedup to handle the duplicate event. You can remove it if no duplicate events will come.

Thanks

0 Karma
Highlighted

Re: "Join" information from two log sources

Explorer
  • Are you sure about | dedup count by tx_id, extra? That seems to yield an empty result.
  • What exactly should the dedup remove? I would like to keep the duplicate "error" event, maybe I got this wrong...?
0 Karma
Highlighted

Re: "Join" information from two log sources

Esteemed Legend

Try this:

| makeresults 
| eval raw="tx_id=1,event=error,extra=foo,sourcetype=A::tx_id=1,event=error,extra=bar,sourcetype=A::tx_id=2,event=info,sourcetype=A::tx_id=2,event=error,extra=baz,sourcetype=A::tx_id=3,event=info,sourcetype=A::tx_id=1,url=/A,sourcetype=B::tx_id=2,url=/B,sourcetype=B::tx_id=3,url=/C,sourcetype=B"
| makemv delim="::" raw
| mvexpand raw
| rename raw AS _raw
| kv
| fields - _time _raw

| rename COMMENT AS "Everything above generates sample data; everything below is your solution"

| search (sourcetype="A" AND event="error") OR sourcetype="B"
| stats values(*) AS * BY tx_id
| stats values(url) AS url BY extra

View solution in original post

0 Karma
Highlighted

Re: "Join" information from two log sources

Explorer

That almost did it 😉 but finally helped me to solve it – thanks!

First, I found that the "base" result can be obtained much more efficiently by doing

* [ search sourcetype="A" event="error" | fields tx_id ]

I suspect that's because I have only few "A" type errors, but lots of "B" type events and so Splunk can avoid retrieving the unneeded events in the first place?

Then, | stats values(*) AS * BY tx_id seems like a feat to me. Am I right that this is somewhat like using |transaction, in that it "groups" together all events with the same tx_id, but maintains the different field values? Does that create multi-valued fields per tx_id?

Last, in fact I had not only one extra field but two ( extra and extra2). I omitted those in the initial question for clarity.

Thus, | stats values(url) AS url BY extra did not work out, but | stats values(extra), values(extra2) by url did.

Final search:

* [ search sourcetype="A" event="error" | fields tx_id ]
| stats values(*) AS * BY tx_id
| stats values(extra), values(extra2) by url

Thanks!

0 Karma