Splunk Search

Help with self join

tly22
Explorer

Hi, 

I have a single search that produces the following table where fieldA and fieldB are arbitrary strings that may be duplicate.

This is an exact representation of each event where each event may have a key "fieldA" or a key "fieldB" but not both but they always have an ID and Timestamp

TimestampIDfieldAfieldB
111151 "z"
112451"a" 
113781"b" 
117681 "d"
118791 "d"
125502"c" 
135802 "e"
157032 "f"
186903 "g"


and I need help to transform the data as follows.

IDfieldAfieldB
1"b""d"
1"b""d"
2"c""e"
2"c""f"
3 "g"


Thanks to suggestion below, I have tried `stats latest(fieldA) list(fieldB)` but I would prefer to not have any multivalued fields

For every distinct value for "fieldA", the latest record with that value would be kept and any records with that ID occuring before that record would be discard.

There is no requirement to have 2 searches. Hope that makes it more clear and easier.

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Taking on your changes, please explain your logic for excluding "z" in your expected results. Also, your example does not have any duplicates so it is unclear, from the expected results, how you want duplicates treated. Having an accurate representation of your data might help clarify this.

Assuming "z" was supposed to be in the results, then my previous solution still works - the mvexpand expands the multivalue field created by list()

| makeresults format=csv data="Timestamp,ID,fieldA,fieldB
11115,1,,z
11245,1,a,
11378,1,b,
11768,1,,d
12550,2,c,
13580,2,,e
15703,2,,f
18690,3,,g"
| stats latest(fieldA) as fieldA list(fieldB) as fieldB by ID
| mvexpand fieldB

 

ITWhisperer
SplunkTrust
SplunkTrust

Where does 2-a2 come from?

Assuming values in fieldB are unique, you could try something like this

| makeresults format=csv data="ID,fieldA,fieldB
1,1-a1,
1,1-a2,
1,,1-b1
2,2-a1,
2,,2-b1
2,,2-b2
3,,1-b1"
``` The lines above emulate the data you shared ```
| stats latest(fieldA) as fieldA list(fieldB) as fieldB by ID
| mvexpand fieldB

yuanliu
SplunkTrust
SplunkTrust

Just as you say, Splunk is not SQL.  So, forget join.  Please let us know what is the nature of the two searches, how close are they?  What are their search periods?  Most of the time, you shouldn't run two separate searches, but instead, combine the two into one search, then try to get the result you need from that one search.


Criteria being if there are duplicate values in fieldA, only the row with the latest value is kept
and each row with fieldB joined to fieldA on same ID. or if there are no values for fieldA, just join with null/blank value

Ideally, we can also throw away all rows with col fieldB that have a timestamp earlier than fieldA but not a hard requirement if that adds too much complexity to the query

Here, you talk about "latest" and "earlier".  But your mock data illustration contains no time information.  How are volunteers supposed to help?

Now, if you MUST run the two searches separately, yes, there are ways to produce right join output in SPL without using join command which most Splunkers advise against.  But let's start at the ABCs of asking answerable questions in a data analytics forum. (That's right, this is not a SQL forum.)  Here are four golden rules that I call Four Commandments:

  • Illustrate data input (in raw text, anonymize as needed), whether they are raw events or output from a search that volunteers here do not have to look at.
  • Illustrate the desired output from illustrated data.
  • Explain the logic between illustrated data and desired output without SPL.
  • If you also illustrate attempted SPL, illustrate actual output and compare with desired output, explain why they look different to you if that is not painfully obvious.

Start from here.

 

tly22
Explorer

Thank you for the feedback, I have updated the post 

0 Karma

tly22
Explorer

.

0 Karma
Get Updates on the Splunk Community!

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...

Explore the Latest Educational Offerings from Splunk (November Releases)

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...