Hi, I have simplified my query as much as possible. Basically I am looking at two issues with this:
1: I cannot perform the joins because a subsearch can only contain 50.000 results, which is not enough to make my query join properly (potential matches are truncated). A "stats" should be able to do this instead, but I am not sure how to replace both the joins (also considering the next issue)...
2: In my example, I have the first join based on the fieldname "name". In "SourcetypeA" that "name" is a single, unique value. In "sourcetypeB" however, that "name" is part of an array called "names" which contains multiple names, one of which is the "name" value from "SourcetypeA". An mvexpand on "SourcetypeB" would resolve that issue, but that creates even more subsearch results (it breaks up event in multiple events) which makes issue number 1 above even worse. How can I get around both these issues?
index=indexA sourcetype=sourcetypeA
| join name [search sourcetype=sourcetypeB | fields name fieldB]
| join fieldB [search sourcetype=sourcetypeC | fields fieldC ]
| table name fieldB fieldC
I'm assuming the lack of fieldB in the second join fields statement is just a typo...
(index=indexA sourcetype=sourcetypeA) OR (sourcetype=sourcetypeB) OR (sourcetype=sourcetypeC)
| fillnull value="--" names
| mvexpand names
| eval name=coalesce(name,names)
| fields name fieldB fieldC
| eventstats values(fieldC) by fieldB
| stats values(*) as * by name
First step is to search all the data, then the fillnull/mvexpand/eval takes care of expanding all the sourcetypeB rows and assigning name from the expanded names, so you can then do the stats to 'join' the data.
As to the sourcetypeC, that field is joining on fieldB, so I believe the eventstats may do the trick above, but you would have to confirm.
Anyway, hope this gives you something to play with
Hi @bowesmana , you are right, the lack of fieldB was a typo.
index=indexA sourcetype=sourcetypeA
| join name [search sourcetype=sourcetypeB | fields name fieldB]
| join fieldB [search sourcetype=sourcetypeC | fields fieldB ]
| table name fieldB fieldC
Something else I described wrong as well is that "fieldB" is the one that is an array/multiline in "sourcetypeC" and a single value in "sourcetypeB". (so not the "name field" I mentioned earlier.
I think that is why your suggested query doesn't merge all results together at the end, something I tried, but haven't been able to resolve today. Any other suggestions? I feel like it's "nearly there".
Your pseudo examples are not very explicit in that your second join
| join fieldB [search sourcetype=sourcetypeC | fields fieldB ]
appears to be redundant. Maybe you could give an example of the data you have and the expected output. What is the desired output of joining on fieldB and only getting fieldB back - I am not sure I understand what you would want to see.
If sourcetype C has an MV fieldB, then as per my original example, you would be mvexpanding fieldB and then doing whatever massaging of the data you need.
Thanks @bowesmana , I was a bit short probably. (I can't use real data however)
Let me put in in the 3 index table format. I am trying to achieve the following outcome:
Index1:
FieldA | FieldB | FieldC |
ValueA1 | ValueB1 | ValueC1 |
ValueA2 | ValueB2 | ValueC2 |
Index2:
FieldD | FieldB | FieldE |
ValueD1 | ValueB1 | ValueE1 |
ValueD2 | ValueB2 | ValueE2 |
Index3:
FieldF | FieldG | FieldE |
ValueF1 | ValueG1 | ValueE1,ValueE5,ValueE8 |
ValueF2 | ValueG2 | ValueE4, ValueE6 |
My desired outcome:
A single event containing the following: FieldA, FieldC, FieldD, FieldF, FieldG
Challenges:
- Replacing the 2 joins I tried to use (too slow and running into the subsearch limitations).
- Matching single values in FieldE (index2) to the multivalue fieldE (index3)
Hey great post - getting my head around that, here's stab 1, replacing the first join with stats and then using append+stats (essentially same as join)
| makeresults
| eval _raw="FieldA FieldB FieldC
ValueA1 ValueB1 ValueC1
ValueA2 ValueB2 ValueC2"
| multikv forceheader=1
| table FieldA FieldB FieldC
| append [
| makeresults
| eval _raw="FieldD FieldB FieldE
ValueD1 ValueB1 ValueE1
ValueD2 ValueB2 ValueE2"
| multikv forceheader=1
| table FieldD FieldB FieldE
]
| stats values(*) as * by FieldB
| append [
| makeresults
| eval _raw="FieldF FieldG FieldE
ValueF1 ValueG1 ValueE1,ValueE5,ValueE8
ValueF2 ValueG2 ValueE4,ValueE6"
| multikv forceheader=1
| table FieldF FieldG FieldE
| eval FieldE=split(FieldE,",")
| mvexpand FieldE
| stats values(*) as * by FieldE
]
| stats values(*) as * by FieldE
| where !isnull(FieldA)
| table FieldA, FieldC, FieldD, FieldF, FieldG
The subsearch | stats values(*) as * by FieldE may not be needed, but is an optimisation depending on your data.
From your example, the ValueA2, which gets to ValueE2, has no F or G fields according to your data, so the above search works as driven.
This is stab 2 with the 2nd join replaced - there's a bit of fiddling around to get it and I am sure there is scope for improvement, but this gets the right results based on your example.
See how it. works for you
| makeresults
| eval _raw="FieldA FieldB FieldC
ValueA1 ValueB1 ValueC1
ValueA2 ValueB2 ValueC2"
| multikv forceheader=1
| table FieldA FieldB FieldC
| append [
| makeresults
| eval _raw="FieldD FieldB FieldE
ValueD1 ValueB1 ValueE1
ValueD2 ValueB2 ValueE2"
| multikv forceheader=1
| table FieldD FieldB FieldE
]
| append [
| makeresults
| eval _raw="FieldF FieldG FieldE
ValueF1 ValueG1 ValueE1,ValueE5,ValueE8
ValueF2 ValueG2 ValueE4,ValueE6"
| multikv forceheader=1
| table FieldF FieldG FieldE
]
| fillnull value="__TMP__" FieldB
| eval COMPOSITE=FieldE."@".FieldF."@".FieldG
| stats values(*) as * by FieldB
| fillnull value="__TMP__" COMPOSITE
| mvexpand COMPOSITE
| rex field=COMPOSITE "(?<FieldE>[^@]*)@(?<FieldF>[^@]*)@(?<FieldG>.*)"
| fields - COMPOSITE
| fillnull value="__TMP__" FieldE
| eval FieldE=split(FieldE,",")
| mvexpand FieldE
| stats values(*) as * by FieldE
| where !isnull(FieldA)
| table FieldA, FieldC, FieldD, FieldF, FieldG
If you're not aware of these example types, the makeresults/append stuff is simply setting up your example data stream - from the
| fillnull value="__TMP__" FieldB
is what is useful in example 2 - note I am using @ sign to make the composite field separator, so you will need to choose accordingly for your data.
Hi @bowesmana , thanks so much for your efforts. I cannot use the subsearches in the "append" part however, because the results will be far beyond the 50.000, leading to truncation and missing data down the track.
The appends in the second example are just data setup and only relevant to showing the example. They would not be relevant in your real data search, as mentioned in my post. Only from the fillnull statement is what you need.