Within the same index and sourcetype, I have some rows containing type=master
and many more rows containing type=slave
. Both these row types have a common identifier field identifier=some_string
. Each master
row has corresponding many slave
rows.
What I want is get all the type=slave
rows but in each row, also add some of the fields that are only found in corresponding type=master
rows.
Example, for rows that look like this:
type=master identifier=id1 prop_mast1=string1 prop_mast2=string1
type=master identifier=id2 prop_mast1=string2 prop_mast2=string2
type=slave identifier=id1 prop_slave1=string1 prop_slave2=string1
type=slave identifier=id1 prop_slave1=string2 prop_slave2=string2
type=slave identifier=id2 prop_slave1=string3 prop_slave2=string3
I want to get a result like this:
type=slave identifier=id1 prop_slave1=string1 prop_slave2=string1 prop_mast1=string1 prop_mast2=string1
type=slave identifier=id1 prop_slave1=string2 prop_slave2=string2 prop_mast1=string1 prop_mast2=string1
type=slave identifier=id2 prop_slave1=string3 prop_slave2=string3 prop_mast1=string2 prop_mast2=string2
Clearly I need to join between type=master
and type=slave
rows on identifier
field. I tried selfjoin
but the result does not look like a joined data set to me at all. What do I do?
If identifier is unique (except when it won't be) you can do this much more simply. Simply "search" all the data you want, then group them either with transaction (which is easier but may have some performance implications) or stats (which is a bit more work but scales better). In either case the transaction way will work for testing/exploring the data just fine...
index=<whatever> sourcetype=<whatever> (type=master OR type=slave)
| transaction maxspan=1h identifier
You'll want to make the maxspan=1h
parameter be as long as two events that should be combined can be separated by, but otherwise keep it as short as you can. (So if the master and slave events will always happen within 5 minutes of each other, set maxspan to 5m, or maybe 7m or something.)
That will give you output all grouped up. Now, stats will be more efficient especially in larger datasets or with clustered indexers. But you sort of need to know what you want out the other end. The basic functionality would be
index=<whatever> sourcetype=<whatever> (type=master OR type=slave)
| stats count, <other fields here with aggregation> BY identifier
The tricky part is the "other fields here with aggregation, because it's a) important and b) not always obvious. Here's one possibility -
index=<whatever> sourcetype=<whatever> (type=master OR type=slave)
| stats count, list(prop_slave1) AS slave1Props, values(prop_slave2) AS slave2Props, max(prop_mast2) AS OTHER BY identifier
Feel free to ask more or to play with the other stats functions - there are many.
Happy Splunking!
Rich