Splunk Search

Trying to replace "join" with "stats" but issue with multivalue field to "join" on.

fsiemonssplunk
Explorer


Hi, I have simplified my query as much as possible. Basically I am looking at two issues with this:

1: I cannot perform the joins because a subsearch can only contain 50.000 results, which is not enough to make my query join properly (potential matches are truncated). A "stats" should be able to do this instead, but I am not sure how to replace both the joins (also considering the next issue)...

2: In my example, I have the first join based on the fieldname "name". In "SourcetypeA" that "name" is a single, unique value. In "sourcetypeB" however, that "name" is part of an array called "names" which contains multiple names, one of which is the "name" value from "SourcetypeA". An mvexpand on "SourcetypeB" would resolve that issue, but that creates even more subsearch results (it breaks up event in multiple events) which makes issue number 1 above even worse. How can I get around both these issues?


index=indexA sourcetype=sourcetypeA
| join name [search sourcetype=sourcetypeB | fields name fieldB]
| join fieldB [search sourcetype=sourcetypeC | fields fieldC ]
| table name fieldB fieldC

 

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@fsiemonssplunk 

I'm assuming the lack of fieldB in the second join fields statement is just a typo...

(index=indexA sourcetype=sourcetypeA) OR (sourcetype=sourcetypeB) OR (sourcetype=sourcetypeC)
| fillnull value="--" names
| mvexpand names
| eval name=coalesce(name,names)
| fields name fieldB fieldC
| eventstats values(fieldC) by fieldB
| stats values(*) as * by name

First step is to search all the data, then the fillnull/mvexpand/eval takes care of expanding all the sourcetypeB rows and assigning name from the expanded names, so you can then do the stats to 'join' the data.

As to the sourcetypeC, that field is joining on fieldB, so I believe the eventstats may do the trick above, but you would have to confirm.

Anyway, hope this gives you something to play with

 

 

0 Karma

fsiemonssplunk
Explorer

Hi @bowesmana , you are right, the lack of fieldB was a typo.

index=indexA sourcetype=sourcetypeA
| join name [search sourcetype=sourcetypeB | fields name fieldB]
| join fieldB [search sourcetype=sourcetypeC | fields fieldB ]
| table name fieldB fieldC

Something else I described wrong as well is that "fieldB" is the one that is an array/multiline in "sourcetypeC" and a single value in "sourcetypeB". (so not the "name field" I mentioned earlier.

I think that is why your suggested query doesn't merge all results together at the end, something I tried, but haven't been able to resolve today. Any other suggestions? I feel like it's "nearly there".

 

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@fsiemonssplunk 

Your pseudo examples are not very explicit in that your second join

| join fieldB [search sourcetype=sourcetypeC | fields fieldB ]

appears to be redundant. Maybe you could give an example of the data you have and the expected output. What is the desired output of joining on fieldB and only getting fieldB back - I am not sure I understand what you would want to see.

If sourcetype C has an MV fieldB, then as per my original example, you would be mvexpanding fieldB and then doing whatever massaging of the data you need.

 

 

 

0 Karma

fsiemonssplunk
Explorer

Thanks @bowesmana , I was a bit short probably. (I can't use real data however)

Let me put in in the 3 index table format. I am trying to achieve the following outcome:

 

Index1:

FieldAFieldBFieldC
ValueA1ValueB1ValueC1
ValueA2ValueB2ValueC2

 

Index2:

FieldDFieldBFieldE
ValueD1ValueB1ValueE1
ValueD2ValueB2ValueE2

 

Index3:

FieldFFieldGFieldE
ValueF1ValueG1ValueE1,ValueE5,ValueE8
ValueF2ValueG2ValueE4, ValueE6

 

My desired outcome:

A single event containing the following: FieldA, FieldC, FieldD, FieldF, FieldG

Challenges:

- Replacing the 2 joins I tried to use (too slow and running into the subsearch limitations).

- Matching single values in FieldE (index2) to the multivalue fieldE (index3)

 

Tags (1)

bowesmana
SplunkTrust
SplunkTrust

@fsiemonssplunk 

Hey great post - getting my head around that, here's stab 1, replacing the first join with stats and then using append+stats (essentially same as join)

| makeresults 
| eval _raw="FieldA	FieldB	FieldC
ValueA1	ValueB1	ValueC1
ValueA2	ValueB2	ValueC2"
| multikv forceheader=1 
| table FieldA FieldB FieldC
| append [
  | makeresults
  | eval _raw="FieldD	FieldB	FieldE
ValueD1	ValueB1	ValueE1
ValueD2	ValueB2	ValueE2"
  | multikv forceheader=1
  | table FieldD FieldB FieldE
]
| stats values(*) as * by FieldB
| append [ 
  | makeresults
  | eval _raw="FieldF	FieldG	FieldE
ValueF1	ValueG1	ValueE1,ValueE5,ValueE8
ValueF2	ValueG2	ValueE4,ValueE6"
  | multikv forceheader=1
  | table FieldF FieldG FieldE
  | eval FieldE=split(FieldE,",")
  | mvexpand FieldE
  | stats values(*) as * by FieldE
]
| stats values(*) as * by FieldE
| where !isnull(FieldA)
| table FieldA, FieldC, FieldD, FieldF, FieldG

The subsearch  | stats values(*) as * by FieldE may not be needed, but is an optimisation depending on your data.

From your example, the ValueA2, which gets to ValueE2, has no F or G fields according to your data, so the above search works as driven.

This is stab 2 with the 2nd join replaced - there's a bit of fiddling around to get it and I am sure there is scope for improvement, but this gets the right results based on your example.

See how it. works for you

| makeresults 
| eval _raw="FieldA	FieldB	FieldC
ValueA1	ValueB1	ValueC1
ValueA2	ValueB2	ValueC2"
| multikv forceheader=1 
| table FieldA FieldB FieldC
| append [
  | makeresults
  | eval _raw="FieldD	FieldB	FieldE
ValueD1	ValueB1	ValueE1
ValueD2	ValueB2	ValueE2"
  | multikv forceheader=1
  | table FieldD FieldB FieldE
]
| append [ 
  | makeresults
  | eval _raw="FieldF	FieldG	FieldE
ValueF1	ValueG1	ValueE1,ValueE5,ValueE8
ValueF2	ValueG2	ValueE4,ValueE6"
  | multikv forceheader=1
  | table FieldF FieldG FieldE
]
| fillnull value="__TMP__" FieldB
| eval COMPOSITE=FieldE."@".FieldF."@".FieldG
| stats values(*) as * by FieldB
| fillnull value="__TMP__" COMPOSITE
| mvexpand COMPOSITE
| rex field=COMPOSITE "(?<FieldE>[^@]*)@(?<FieldF>[^@]*)@(?<FieldG>.*)"
| fields - COMPOSITE
| fillnull value="__TMP__" FieldE
| eval FieldE=split(FieldE,",") 
| mvexpand FieldE
| stats values(*) as * by FieldE
| where !isnull(FieldA)
| table FieldA, FieldC, FieldD, FieldF, FieldG

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If you're not aware of these example types, the makeresults/append stuff is simply setting up your example data stream - from the 

| fillnull value="__TMP__" FieldB

is what is useful in example 2 - note I am using @ sign to make the composite field separator, so you will need to choose accordingly for your data.

 

0 Karma

fsiemonssplunk
Explorer

Hi @bowesmana , thanks so much for your efforts. I cannot use the subsearches in the "append" part however, because the results will be far beyond the 50.000, leading to truncation and missing data down the track.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@fsiemonssplunk 

The appends in the second example  are just data setup and only relevant to showing the example. They would not be relevant in your real data search, as mentioned in my post. Only from the fillnull statement is what you need.

 

0 Karma
Get Updates on the Splunk Community!

Notification Email Migration Announcement

The Notification Team is migrating our email service provider from Postmark to AWS Simple Email Service (SES) ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...