I have two sourcetypes with data as follows:
tx_id=1, event=error, extra=foo tx_id=1, event=error, extra=bar tx_id=2, event=info tx_id=2, event=error, extra=baz tx_id=3, event=info
tx_id=1, url=/A tx_id=2, url=/B tx_id=3, url=/C
What I would like to get is a table with all
error events from the first log, the particular
extra info and the corresponding
url from the second log. That is, (ideally) the result would be
extra=foo, url=/A extra=bar, url=/A extra=baz, url=/B
I know SQL fairly well and this would be a plain, simple "join". But I just cannot get this to work in Splunk with a
| transaction (which will not give me the two rows in the
tx_id=1 case), a
| join or
As this must be really straightforward, I am probably missing the obvious and would appreciate any help or pointers.
Thanks a lot!
Can you try this search?
sourcetype=first_sourcetype event=error | dedup count by tx_id,extra append [ search sourcetype=second_sourcetype | dedup tx_id url ] | stats values(extra) as extra values(url) as url by tx_id
Here, I have used
dedup to handle the duplicate event. You can remove it if no duplicate events will come.
| dedup count by tx_id, extra? That seems to yield an empty result.
dedupremove? I would like to keep the duplicate "error" event, maybe I got this wrong...?
| makeresults | eval raw="tx_id=1,event=error,extra=foo,sourcetype=A::tx_id=1,event=error,extra=bar,sourcetype=A::tx_id=2,event=info,sourcetype=A::tx_id=2,event=error,extra=baz,sourcetype=A::tx_id=3,event=info,sourcetype=A::tx_id=1,url=/A,sourcetype=B::tx_id=2,url=/B,sourcetype=B::tx_id=3,url=/C,sourcetype=B" | makemv delim="::" raw | mvexpand raw | rename raw AS _raw | kv | fields - _time _raw | rename COMMENT AS "Everything above generates sample data; everything below is your solution" | search (sourcetype="A" AND event="error") OR sourcetype="B" | stats values(*) AS * BY tx_id | stats values(url) AS url BY extra
That almost did it 😉 but finally helped me to solve it – thanks!
First, I found that the "base" result can be obtained much more efficiently by doing
* [ search sourcetype="A" event="error" | fields tx_id ]
I suspect that's because I have only few "A" type errors, but lots of "B" type events and so Splunk can avoid retrieving the unneeded events in the first place?
| stats values(*) AS * BY tx_id seems like a feat to me. Am I right that this is somewhat like using
|transaction, in that it "groups" together all events with the same
tx_id, but maintains the different field values? Does that create multi-valued fields per
Last, in fact I had not only one
extra field but two (
extra2). I omitted those in the initial question for clarity.
| stats values(url) AS url BY extra did not work out, but
| stats values(extra), values(extra2) by url did.
* [ search sourcetype="A" event="error" | fields tx_id ] | stats values(*) AS * BY tx_id | stats values(extra), values(extra2) by url