How to use stats to join data and avoid the use of...

ndaniel88 · ‎06-03-2019

Hello, I'm joining data from two different sources, basically I have a table with 3 fields: host, source1, source2. Currently my search looks like:

index=index1 | some crazy stuff | fields source1 host
| append [search index=index2 | some more crazy struff | fields source2 host]
| stats values(source1) as source1, values(source2) as source2 by host
| mvexpand source1
| mvexpand source2

The data after the append is looking as:

+---------+--------------+--------------+
|  host   |   source1    |   source2    |
+---------+--------------+--------------+
| device1 | somedata_1_1 |              |
| device2 | somedata_1_2 |              |
| device1 | somedata_1_3 |              |
| device3 | somedata_1_4 |              |
| device2 | somedata_1_5 |              |
| device1 |              | somedata_2_1 |
| device2 |              | somedata_2_2 |
+---------+--------------+--------------+

The data joined should be like:

+---------+--------------+--------------+
|  host   |   source1    |   source2    |
+---------+--------------+--------------+
| device1 | somedata_1_1 | somedata_2_1 |
| device1 | somedata_1_3 | somedata_2_1 |
| device2 | somedata_1_2 | somedata_2_2 |
| device2 | somedata_1_5 | somedata_2_2 |
| device3 | somedata_1_4 |              |
+---------+--------------+--------------+

The problem comes after the stats when I tried to mvexpand those fields. The amount of data is huge, and then the mvexpand is always truncated. The data coming from the first source is so huge, and I have more than 4k rows.

Any solution to join data an avoid the use of mvexpand?

VatsalJagani · ‎06-03-2019

Hi @ndaniel88,

It's very simple answer. You don't want multi-valued field move those to group by.
LIke - | stats count by host, source1, source2

Hope this helps!!

ndaniel88 · ‎06-04-2019

Does not work. Results is always 0

DavidHourani · ‎06-03-2019

Hi @ndaniel88,

The two main ways to join data in Splunk are :
1- using the join command. https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Join
2- using the stats command as you showed in your example.

Picking one or the other depends on what you are trying to achieve and which one will run faster for you. In your case if you're trying to get a table with source1 source2 host on every line then join MIGHT give you faster results than a stats followed by mvexpand so give it a shot and see.

If you're trying to run specific stats on these fields instead of just building a table then please give us more details and we can see how to optimize it.

Cheers,
David

DavidHourani · ‎06-03-2019

lol you updated the question while I was writing this. Try this, it'll run way faster for a start :

 (index=index1 somecrazystuff) OR (index=index2 somemorecrazystuff) 
|  stats values(source1) as source1, values(source2) as source2 by host
 | mvexpand source1
 | mvexpand source2

ndaniel88 · ‎06-04-2019

🙂 Thanks.

Shouldn't I avoid the use of join?

But yeah, I just need to build the table, all the stats I need was done before of the join. I'll give a try, thanks.

aromanauskas · ‎06-03-2019

There are many options for this type of operation but we need to see a bit more of the example data to know the best way to get to a solution. Can you post the original search and (sanitized) example data.

ndaniel88 · ‎06-03-2019

Thank you @aromanauskas

I have updated the OP.

aromanauskas · ‎06-03-2019

Ok so from the data set you have given there is only ever one value for source2 per host. If there is not only one value for this field do you expect to see it duplicated multiple times in the output.

Either way try starting your search with:

index=index1 OR index=index2 | some crazy struff | more crazy stuff

If you are not working on fields that collide in any way that is.

And what exactly is the use case you're trying to solve for. Usually a table with host and 2 random fields can be hard to understand so there may be a better solution.

How to use stats to join data and avoid the use of mvexpand?

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases