Re: How to join search results from two indexes ba...

ravi_lookout · ‎09-26-2024

I have 2 indexes - index_1 and index_2

index_1 has the following fields

index1Id

currEventId

prevEventId

index_2 has the following fields

index2Id

eventId

eventOrigin

currEventId and prevEventId in index_1 will have the same values as that of eventId of index_2

Now, I am trying to create the table of the following format

index1Id

prevEventId

prevEventOrigin

currEventId

currEventOrigin

I tried the joins with the below query, but I see that the columns 3 and 5 are mostly blank. So, I am not sure what is wrong with the query.

index="index_1"
| join type=left currEventId 
    [ search index="index_2" 
     | rename eventId as currEventId, eventOrigin as currEventOrigin 
     | fields currEventId, currEventOrigin]
| join type=left prevEventId 
    [ search index="index_2" 
     | rename eventId as prevEventId, eventOrigin as prevEventOrigin 
     | fields prevEventId, prevEventOrigin]
| table index1Id, prevEventOrigin, currEventOrigin, prevEventId, currEventId

And based on the online suggestions, I am trying the following approach, but couldn't complete it (works fine by populating all the columns)

(index="index_1") OR (index="index_2") | eval joiner=if(index="index_1", prevEventId, eventId) | stats values(*) as * by joiner 
|  where prevEventId=eventId | rename eventOrigin AS previousEventOrigin, eventId as previousEventId 
|  table index1Id, previousEventId, previousEventOrigin

Please let me know an efficient way to achieve the solution. Thanks

tread_splunk · ‎09-26-2024

Could you provide some sample (dummy) events from both index?

ravi_lookout · ‎09-26-2024

Sample form index_1

{
 "index1Id": "Id_1",
 "currEventId": "EventId_1",
 "prevEventId": "EventId_2"
}

EventId_1 from index_2

{
 "eventId": "EventId_1",
 "eventOrigin": "EventOrigin_1",
}

EventId_2 from index_2

{
 "eventId": "EventId_2",
 "eventOrigin": "EventOrigin_2",
}

The final result I am looking for, after the search

index1Id	prevEventId	prevEventOrigin	currEventId	currEventOrigin
Id_1	EventId_2	EventOrigin_2	EventId_1	EventOrigin_1

Thanks @tread_splunk

PickleRick · ‎09-26-2024

Ahhh... So you don't want to just join two indexes. You want to join an index onto itself using external "bracketing" event.

Ugh.

With small datasets you can try to stats once then append the index again and stats another time. Well, you could even try to use the cursed join command 😉

But the real question is how to do this operation effectively. I have a rough idea but have to test it first.

PickleRick · ‎09-27-2024

OK. Got it.

A run-anywhere search including mockup data

| makeresults format=csv data="index,index1Id,curEventId,prevEventId,eventId,eventOrigin
index1,23,11,13,,
index1,34,12,14,,
index1,35,12,16,,
index1,65,17,11,,
index1,88,15,12,,
index2,,,,11,1
index2,,,,12,2
index2,,,,13,3
index2,,,,14,4
index2,,,,15,5
index2,,,,16,6
index2,,,,17,7"
```This is just a mockup data preparation; now the fun begins```
```We make two EventId fields from our original one (we can't use rename because we don't want
to overwrite the values in the "joining" events with null values```
| eval curEventId=if(index="index1",curEventId,eventId)
| eval prevEventId=if(index="index1",prevEventId,eventId)
```And now we "copy over" the values from "single side" results into the compound "both sides" result```
```Be cautious about streamstats limitations```
| sort - index
| fields - index
| streamstats values(eventOrigin) AS curEventOrigin by curEventId
| streamstats values(eventOrigin) AS prevEventOrigin by prevEventId
```We only need the combined results, not the partial ones```
| where isnotnull(index1Id)
```clear empty fields```
| fields - eventId eventOrigin

ravi_lookout · ‎09-30-2024

@PickleRick , sorry I am not sure I fully understand. May I know where are we using the index_2 at all in the query?
Also, if I have to form the dummy data, would I not rather have two CSVs - one for the index_1 data and the other for index_2 data?

Btw, I tried to run the query, I am not getting the data in the tabular format. Adding this - table index1Id, curEventOrigin, curEventId, prevEventOrigin, prevEventId to the end of your query didn't help.

Thanks

Ravi

PickleRick · ‎09-30-2024

The mockup data contains events from both index1 and index2 (the first column of the dummy data).

It is assumed to be an equivalent of searching over (index=index1 OR index=index2).

Did you copy-paste my example search raw or did you modify it? And which Splunk version are you using?

tread_splunk · ‎09-27-2024

Nice work @PickleRick ! Imaginative approach!! I tried out your solution and it appears to work if you replace streamstats with eventstats. Feels like that should work to me and eventstats feels more efficient than streamstats. Any thoughts?

PickleRick · ‎09-27-2024

Eventstats should work as well (streamstats relies obviously on the order of results that's why I'm sorting on index so that the "payload" events are before the "joining" events; if your indexes are named differently, you need to adjust this sort). Both commands have their own limitations and it will probably depend on particular use case which approach is more effective.

PickleRick · ‎09-26-2024

1. I'm not sure what you mean by "currEventId and prevEventId in index_1 will have the same values as that of eventId of index_2". Reading it literally it would mean that currEventId and prevEventId have the same value. So you can use just one of those fields, right?

2. Your stats-based idea looks pretty sound but:

- You use values(*) as * when you only use some of them. If you have many fields it's good to list them explicitly so you don't waste memory on storing fields you'll discard without using them

- I'm not sure what the part after the stats command is supposed to do. OK the "where" command may only leave the results "matching both sides of the join". But the rename/table? Rename just for the sake of it? Are you sure you have a field called index1Id?

ravi_lookout · ‎09-26-2024

@PickleRick , thanks for responding.

1. I just posted a sample for each of the indexes as a reply to Tred_splunk's question. Can you please check and see if that makes it clear? - https://community.splunk.com/t5/Splunk-Search/How-to-join-search-results-from-two-indexes-based-on-m...

2. stats based search is good and I will consider your suggestion of adding only the necessary fields. However, this query is incomplete (in the sense that I am able to correlate only 1 event from index_2 to index_1 but not the other event)

3. The initial thought of renaming was to provide the distinction between two events from the same index (index_2) by identifying them as "current" and "previous"

I hope I was able to clarify.

Thanks

How to join search results from two indexes based on multiple conditions

join

subsearch

table

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard

Are you a member of the Splunk Community?