Splunk Search

How to use stats to join data and avoid the use of mvexpand?

ndaniel88
Explorer

Hello, I'm joining data from two different sources, basically I have a table with 3 fields: host, source1, source2. Currently my search looks like:

index=index1 | some crazy stuff | fields source1 host
| append [search index=index2 | some more crazy struff | fields source2 host]
| stats values(source1) as source1, values(source2) as source2 by host
| mvexpand source1
| mvexpand source2

The data after the append is looking as:

+---------+--------------+--------------+
|  host   |   source1    |   source2    |
+---------+--------------+--------------+
| device1 | somedata_1_1 |              |
| device2 | somedata_1_2 |              |
| device1 | somedata_1_3 |              |
| device3 | somedata_1_4 |              |
| device2 | somedata_1_5 |              |
| device1 |              | somedata_2_1 |
| device2 |              | somedata_2_2 |
+---------+--------------+--------------+

The data joined should be like:

+---------+--------------+--------------+
|  host   |   source1    |   source2    |
+---------+--------------+--------------+
| device1 | somedata_1_1 | somedata_2_1 |
| device1 | somedata_1_3 | somedata_2_1 |
| device2 | somedata_1_2 | somedata_2_2 |
| device2 | somedata_1_5 | somedata_2_2 |
| device3 | somedata_1_4 |              |
+---------+--------------+--------------+

The problem comes after the stats when I tried to mvexpand those fields. The amount of data is huge, and then the mvexpand is always truncated. The data coming from the first source is so huge, and I have more than 4k rows.

Any solution to join data an avoid the use of mvexpand?

0 Karma

Tartif
Loves-to-Learn

Hi,
I know the post was in 2019, but for the next one who fall on this topic, I share some tips about that.

Use double stats to avoid mvexpand :

index=index1 | some crazy stuff | fields source1 host
| append [search index=index2 | some more crazy struff | fields source2 host]
| stats values(source1) as source1, values(source2) as source2 by host
```add this next line if you want source1 or source2 are null :
|fillnull value="N/A" source1 source2 ```
|stats c by host source1 source2

 Hope this will be helpfull 🙂

Tags (1)
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

Hi @ndaniel88,

It's very simple answer. You don't want multi-valued field move those to group by.
LIke - | stats count by host, source1, source2

Hope this helps!!

0 Karma

ndaniel88
Explorer

Does not work. Results is always 0

0 Karma

DavidHourani
Super Champion

Hi @ndaniel88,

The two main ways to join data in Splunk are :
1- using the join command. https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Join
2- using the stats command as you showed in your example.

Picking one or the other depends on what you are trying to achieve and which one will run faster for you. In your case if you're trying to get a table with source1 source2 host on every line then join MIGHT give you faster results than a stats followed by mvexpand so give it a shot and see.

If you're trying to run specific stats on these fields instead of just building a table then please give us more details and we can see how to optimize it.

Cheers,
David

0 Karma

DavidHourani
Super Champion

lol you updated the question while I was writing this. Try this, it'll run way faster for a start :

 (index=index1 somecrazystuff) OR (index=index2 somemorecrazystuff) 
|  stats values(source1) as source1, values(source2) as source2 by host
 | mvexpand source1
 | mvexpand source2
0 Karma

ndaniel88
Explorer

🙂 Thanks.

Shouldn't I avoid the use of join?

But yeah, I just need to build the table, all the stats I need was done before of the join. I'll give a try, thanks.

0 Karma

aromanauskas
Path Finder

There are many options for this type of operation but we need to see a bit more of the example data to know the best way to get to a solution. Can you post the original search and (sanitized) example data.

0 Karma

ndaniel88
Explorer

Thank you @aromanauskas

I have updated the OP.

0 Karma

aromanauskas
Path Finder

Ok so from the data set you have given there is only ever one value for source2 per host. If there is not only one value for this field do you expect to see it duplicated multiple times in the output.

Either way try starting your search with:

index=index1 OR index=index2 | some crazy struff | more crazy stuff

If you are not working on fields that collide in any way that is.

And what exactly is the use case you're trying to solve for. Usually a table with host and 2 random fields can be hard to understand so there may be a better solution.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...