Splunk Search

What is the right way to join fields from different rows into one?

Path Finder

Within the same index and sourcetype, I have some rows containing type=master and many more rows containing type=slave. Both these row types have a common identifier field identifier=some_string. Each master row has corresponding many slave rows.

What I want is get all the type=slave rows but in each row, also add some of the fields that are only found in corresponding type=master rows.

Example, for rows that look like this:

type=master identifier=id1 prop_mast1=string1 prop_mast2=string1
type=master identifier=id2 prop_mast1=string2 prop_mast2=string2
type=slave identifier=id1 prop_slave1=string1 prop_slave2=string1
type=slave identifier=id1 prop_slave1=string2 prop_slave2=string2
type=slave identifier=id2 prop_slave1=string3 prop_slave2=string3

I want to get a result like this:

type=slave identifier=id1 prop_slave1=string1 prop_slave2=string1 prop_mast1=string1 prop_mast2=string1
type=slave identifier=id1 prop_slave1=string2 prop_slave2=string2 prop_mast1=string1 prop_mast2=string1
type=slave identifier=id2 prop_slave1=string3 prop_slave2=string3 prop_mast1=string2 prop_mast2=string2

Clearly I need to join between type=master and type=slave rows on identifier field. I tried selfjoin but the result does not look like a joined data set to me at all. What do I do?

0 Karma


If identifier is unique (except when it won't be) you can do this much more simply. Simply "search" all the data you want, then group them either with transaction (which is easier but may have some performance implications) or stats (which is a bit more work but scales better). In either case the transaction way will work for testing/exploring the data just fine...

index=<whatever> sourcetype=<whatever> (type=master OR type=slave)
| transaction maxspan=1h identifier

You'll want to make the maxspan=1h parameter be as long as two events that should be combined can be separated by, but otherwise keep it as short as you can. (So if the master and slave events will always happen within 5 minutes of each other, set maxspan to 5m, or maybe 7m or something.)

That will give you output all grouped up. Now, stats will be more efficient especially in larger datasets or with clustered indexers. But you sort of need to know what you want out the other end. The basic functionality would be

index=<whatever> sourcetype=<whatever> (type=master OR type=slave)
| stats count, <other fields here with aggregation> BY identifier

The tricky part is the "other fields here with aggregation, because it's a) important and b) not always obvious. Here's one possibility -

index=<whatever> sourcetype=<whatever> (type=master OR type=slave)
| stats count, list(prop_slave1) AS slave1Props, values(prop_slave2) AS slave2Props, max(prop_mast2) AS OTHER BY identifier

Feel free to ask more or to play with the other stats functions - there are many.

Happy Splunking!

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!