Splunk Search

How to join search results from two indexes based on multiple conditions

ravi_lookout
Engager

I have 2 indexes - index_1 and index_2

index_1 has the following fields

index1Id

currEventId

prevEventId

index_2 has the following fields

index2Id

eventId

eventOrigin

currEventId and prevEventId  in index_1 will have the same values as that of eventId of index_2

Now, I am trying to create the table of the following format

index1IdprevEventIdprevEventOrigincurrEventIdcurrEventOrigin

 

I tried the joins with the below query, but I see that the columns 3 and 5 are mostly blank. So, I am not sure what is wrong with the query.

 

 

 

index="index_1"
| join type=left currEventId 
    [ search index="index_2" 
     | rename eventId as currEventId, eventOrigin as currEventOrigin 
     | fields currEventId, currEventOrigin]
| join type=left prevEventId 
    [ search index="index_2" 
     | rename eventId as prevEventId, eventOrigin as prevEventOrigin 
     | fields prevEventId, prevEventOrigin]
| table index1Id, prevEventOrigin, currEventOrigin, prevEventId, currEventId

 

 

 

 

And based on the online suggestions, I am trying the following approach, but couldn't complete it (works fine by populating all the columns)

 

 

 

(index="index_1") OR (index="index_2") | eval joiner=if(index="index_1", prevEventId, eventId) | stats values(*) as * by joiner 
|  where prevEventId=eventId | rename eventOrigin AS previousEventOrigin, eventId as previousEventId 
|  table index1Id, previousEventId, previousEventOrigin

 

 

Please let me know an efficient way to achieve the solution. Thanks

  

Labels (3)
0 Karma

tread_splunk
Splunk Employee
Splunk Employee

Could you provide some sample (dummy) events from both index?

0 Karma

ravi_lookout
Engager

Sample form index_1

{
 "index1Id": "Id_1",
 "currEventId": "EventId_1",
 "prevEventId": "EventId_2"
}

 

EventId_1 from index_2

{
 "eventId": "EventId_1",
 "eventOrigin": "EventOrigin_1",
}

 

EventId_2 from index_2

{
 "eventId": "EventId_2",
 "eventOrigin": "EventOrigin_2",
}

 

The final result I am looking for, after the search 

index1IdprevEventIdprevEventOrigincurrEventIdcurrEventOrigin
Id_1EventId_2EventOrigin_2EventId_1EventOrigin_1

 

Thanks @tread_splunk 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ahhh... So you don't want to just join two indexes. You want to join an index onto itself using external "bracketing" event.

Ugh.

With small datasets you can try to stats once then append the index again and stats another time. Well, you could even try to use the cursed join command 😉

But the real question is how to do this operation effectively. I have a rough idea but have to test it first.

PickleRick
SplunkTrust
SplunkTrust

OK. Got it.

A run-anywhere search including mockup data

| makeresults format=csv data="index,index1Id,curEventId,prevEventId,eventId,eventOrigin
index1,23,11,13,,
index1,34,12,14,,
index1,35,12,16,,
index1,65,17,11,,
index1,88,15,12,,
index2,,,,11,1
index2,,,,12,2
index2,,,,13,3
index2,,,,14,4
index2,,,,15,5
index2,,,,16,6
index2,,,,17,7"
```This is just a mockup data preparation; now the fun begins```
```We make two EventId fields from our original one (we can't use rename because we don't want
to overwrite the values in the "joining" events with null values```
| eval curEventId=if(index="index1",curEventId,eventId)
| eval prevEventId=if(index="index1",prevEventId,eventId)
```And now we "copy over" the values from "single side" results into the compound "both sides" result```
```Be cautious about streamstats limitations```
| sort - index
| fields - index
| streamstats values(eventOrigin) AS curEventOrigin by curEventId
| streamstats values(eventOrigin) AS prevEventOrigin by prevEventId
```We only need the combined results, not the partial ones```
| where isnotnull(index1Id)
```clear empty fields```
| fields - eventId eventOrigin

 

0 Karma

ravi_lookout
Engager

@PickleRick , sorry I am not sure I fully understand. May I know where are we using the index_2 at all in the query?
Also, if I have to form the dummy data, would I not rather have two CSVs - one for the index_1 data and the other for index_2 data?

Btw, I tried to run the query, I am not getting the data in the tabular format. Adding this - table index1Id, curEventOrigin, curEventId, prevEventOrigin, prevEventId to the end of your query didn't help.

Thanks

Ravi

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The mockup data contains events from both index1 and index2 (the first column of the dummy data).

It is assumed to be an equivalent of searching over (index=index1 OR index=index2).

Did you copy-paste my example search raw or did you modify it? And which Splunk version are you using?

0 Karma

tread_splunk
Splunk Employee
Splunk Employee

Nice work @PickleRick !  Imaginative approach!!  I tried out your solution and it appears to work if you replace streamstats with eventstats.  Feels like that should work to me and eventstats feels more efficient than streamstats.  Any thoughts?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Eventstats should work as well (streamstats relies obviously on the order of results that's why I'm sorting on index so that the "payload" events are before the "joining" events; if your indexes are named differently, you need to adjust this sort). Both commands have their own limitations and it will probably depend on particular use case which approach is more effective.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. I'm not sure what you mean by "currEventId and prevEventId  in index_1 will have the same values as that of eventId of index_2". Reading it literally it would mean that currEventId and prevEventId have the same value. So you can use just one of those fields, right?

2. Your stats-based idea looks pretty sound but:

- You use values(*) as * when you only use some of them. If you have many fields it's good to list them explicitly so you don't waste memory on storing fields you'll discard without using them

- I'm not sure what the part after the stats command is supposed to do. OK the "where" command may only leave the results "matching  both sides of the join". But the rename/table? Rename just for the sake of it? Are you sure you have a field called index1Id?

0 Karma

ravi_lookout
Engager

@PickleRick , thanks for responding.


1. I just posted a sample for each of the indexes as a reply to Tred_splunk's question. Can you please check and see if that makes it clear? - https://community.splunk.com/t5/Splunk-Search/How-to-join-search-results-from-two-indexes-based-on-m...

2. stats based search is good and I will consider your suggestion of adding only the necessary fields. However, this query is incomplete (in the sense that I am able to correlate only 1 event from index_2 to index_1 but not the other event)

3. The initial thought of renaming was to provide the distinction between two events from the same index (index_2) by identifying them as "current" and "previous"

I hope I was able to clarify.

Thanks

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...