Splunk Search

Large scale join between two sourcetypes

NDolan
Loves-to-Learn

Hello Everyone,

 

I have been working on a problem for the last few weeks and haven't had huge amounts of success and was hoping someone here could point me in the right direction.

I have two data sources. In datasource A I have multiple records per host/asset - hundreds to thousands. In datasource B I have one record per host/asset.

I need to take a field from the record in datasource B (tags in this case) and append it to every record in datasource A based on a unique key (asset_uuid in this scenario). With the goal being to do various calculations, searches and aggregations on the hundreds/thousands of events based on the tag field values.

I was first looking at transaction, but that was merging all ~500 records for each asset in datasource A which is not what I need. I then started looking at a join command which I had mostly working I think but from what I can tell the subsearch on the join has a limit where it will only affect 500000 events as far as my research tells me. In other talks I have heard some people mention appendcols which, if I am reading the documentation for it correctly, won't do this either as it is more of a 1-to-1 than a 1-to-many.

My next route is to see if stats or maybe a calculated field might be able to do this? I was hoping that those more experienced might be able point me where to start looking to get this to work while I am researching this. It seems like something that should be super easy but I, and those I have spoken to, haven't found a path yet.

Thanks everyone for any advice you may have to give.

 

Labels (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @NDolan,

I usually try to use stats (when possible) because it's the fastest solution and hasn't the problem of 50,000 results in subsearch, so I'd try something like this:

(<search_datasource_A>) OR (<search_datasource_B>) 
| stats values(fieldA_1) AS fieldA_1 values(fieldA_2) AS fieldA_2 values(fieldA_N) AS fieldA_N values(tags) AS tags BY asset_uuid 

instead values you can use other options:

Ciao.

Giuseppe

0 Karma

NDolan
Loves-to-Learn

Hello Giuseppe,

Thanks for the response, I'll be trying this shortly now that I am not being pulled in the opposite direction with other surprise work.

To be sure I am reading this correctly where you have:

values(fieldA_N) AS fieldA_N

 

I have to have this code for every field that I will be wanting to run further queries on after we have everything merged properly?

Thanks,
Nathan

0 Karma

ITWhisperer
Ultra Champion
search A
| append [search B]
| eventstats values(tags) as tags by asset_uuid
| where (event from search A)
0 Karma

NDolan
Loves-to-Learn

Thank you, I will be trying this shortly.

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!