Hello, I'm joining data from two different sources, basically I have a table with 3 fields: host, source1, source2. Currently my search looks like:
index=index1 | some crazy stuff | fields source1 host
| append [search index=index2 | some more crazy struff | fields source2 host]
| stats values(source1) as source1, values(source2) as source2 by host
| mvexpand source1
| mvexpand source2
The data after the append is looking as:
+---------+--------------+--------------+
| host | source1 | source2 |
+---------+--------------+--------------+
| device1 | somedata_1_1 | |
| device2 | somedata_1_2 | |
| device1 | somedata_1_3 | |
| device3 | somedata_1_4 | |
| device2 | somedata_1_5 | |
| device1 | | somedata_2_1 |
| device2 | | somedata_2_2 |
+---------+--------------+--------------+
The data joined should be like:
+---------+--------------+--------------+
| host | source1 | source2 |
+---------+--------------+--------------+
| device1 | somedata_1_1 | somedata_2_1 |
| device1 | somedata_1_3 | somedata_2_1 |
| device2 | somedata_1_2 | somedata_2_2 |
| device2 | somedata_1_5 | somedata_2_2 |
| device3 | somedata_1_4 | |
+---------+--------------+--------------+
The problem comes after the stats when I tried to mvexpand those fields. The amount of data is huge, and then the mvexpand is always truncated. The data coming from the first source is so huge, and I have more than 4k rows.
Any solution to join data an avoid the use of mvexpand?
Hi,
I know the post was in 2019, but for the next one who fall on this topic, I share some tips about that.
Use double stats to avoid mvexpand :
index=index1 | some crazy stuff | fields source1 host
| append [search index=index2 | some more crazy struff | fields source2 host]
| stats values(source1) as source1, values(source2) as source2 by host
```add this next line if you want source1 or source2 are null :
|fillnull value="N/A" source1 source2 ```
|stats c by host source1 source2
Hope this will be helpfull 🙂
Hi @ndaniel88,
It's very simple answer. You don't want multi-valued field move those to group by.
LIke - | stats count by host, source1, source2
Hope this helps!!
Does not work. Results is always 0
Hi @ndaniel88,
The two main ways to join data in Splunk are :
1- using the join
command. https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Join
2- using the stats
command as you showed in your example.
Picking one or the other depends on what you are trying to achieve and which one will run faster for you. In your case if you're trying to get a table with source1
source2
host
on every line then join
MIGHT give you faster results than a stats
followed by mvexpand
so give it a shot and see.
If you're trying to run specific stats on these fields instead of just building a table then please give us more details and we can see how to optimize it.
Cheers,
David
lol you updated the question while I was writing this. Try this, it'll run way faster for a start :
(index=index1 somecrazystuff) OR (index=index2 somemorecrazystuff)
| stats values(source1) as source1, values(source2) as source2 by host
| mvexpand source1
| mvexpand source2
🙂 Thanks.
Shouldn't I avoid the use of join?
But yeah, I just need to build the table, all the stats I need was done before of the join. I'll give a try, thanks.
There are many options for this type of operation but we need to see a bit more of the example data to know the best way to get to a solution. Can you post the original search and (sanitized) example data.
Thank you @aromanauskas
I have updated the OP.
Ok so from the data set you have given there is only ever one value for source2 per host. If there is not only one value for this field do you expect to see it duplicated multiple times in the output.
Either way try starting your search with:
index=index1 OR index=index2 | some crazy struff | more crazy stuff
If you are not working on fields that collide in any way that is.
And what exactly is the use case you're trying to solve for. Usually a table with host and 2 random fields can be hard to understand so there may be a better solution.