Join data from two indexes

ankithreddy777 · ‎05-08-2017

I have some data on index1 and some other data on Index2.
Let us say that I have fields a,b &c to be derived from index1 and fields c & d to be derived from index2.

I have scenario that for some c in index1 there is a respective value d in index 2. How to combine both indexes to get final result as
a,b,c &d with top performance.

On index1 I will search for 30 search results every time. so I will get 30 results for field C that need to be searched in Index2 and combine.

chimell · ‎05-09-2017

Hi ankithreddy777
Try this search

|set union[search index=index1|fields a b c][search index=index2|fields c d]

ankithreddy777 · ‎05-09-2017

I just need the d values only where c matches

briancronrath · ‎05-08-2017

is it the same 30 second time range for both indexes you need to query? If so I don't think any subsearch is necessary, just search both indexes in your base search, something like:
(index=A OR index=B) |
Then can do fill some null values and enable a stats call where you group by field c. If the field is named differently in each index you'll also need to add in an eval that creates a singular field to use for the grouping. So something like this:

 (index=A OR index=B) | fillnull value="" a,b,c,d | stats sum(a) sum(b) sum(d) by c

Can change the sum to be whatever function you need for your particular stats call.

DalJeanis · ‎05-08-2017

The most efficient answer is going to depend on the characteristics of your two data sources. If the data from the left part of the search returns a small number of values that can then be looked up on the right, then a map might be the right answer. On the other hand, if the right side contains a limited number of categorical variables-- say zip codes, or roles -- then maybe a lookup is the right answer. A join is usually efficient when both sides are relatively large, and most of the records in at least one side are going to be used.

Here's a couple of places with extensive answers on this. Read the section on "splunk stew" - that's usually the most efficient way, if it matches your data and the above cases are not clearly in evidence.

https://answers.splunk.com/answers/524250/how-to-search-for-matches-in-two-different-searche.html
https://answers.splunk.com/answers/509697/how-to-combine-my-2-reports.html

ankithreddy777 · ‎05-08-2017

Hi, I will have only 30 results on index 1 , I need to map field C in index 1 with field C in index 2 (which contains large set of data) . Only few C matches in index B. I should display all results in index1 and matching results from index B as |table a,b,c,d.

Could you please give me a sample query for this scenario

DalJeanis · ‎05-08-2017

This might also work efficiently for your needs...

index=index2 other search terms 
    [search index=index1 ...other search terms... 
    | table a b c 
    | outputcsv myvalues.csv 
    | table c]
| table c d
| append [ | inputcsv myvalues.csv]
| stats values(*) as * by c

DalJeanis · ‎05-08-2017

I can't be absolutely sure that this is the most efficient, without more details, but here goes an example of the map command.

index=index1 ...other search terms... 
| table a b c 
| rename a as a1 b as b1 c as c1
| map search="search index=index2 c=$c1$ ... other search terms... | table c d | eval a=$a1$ | eval b=$b1$" maxsearches=30

...or, if the value of a variable, for example a, can contain spaces in the middle of it...

| map search="search index=index2 c=$c1$ ... other search terms... | table c d | eval a=\"$a1$\" | eval b=$b1$" maxsearches=30

The renames in my example are not strictly necessary, but I wanted you to be able to see where each field value was coming from, so that you knew that $c1$ was the value of the c variable that got returned from the first search.

ankithreddy777 · ‎05-08-2017

Thank you so much DalJeanis. Great Answers. I just have one question, what if my C value is not an extracted field. i.e I need to use rex to extract it. How to use this using map command.

ankithreddy777 · ‎05-09-2017

Hi DalJeanis, If I implement the above criteria, My results set is just limited to number of matches of c.

If there are 3 matches, my final result set has only three results. If one c from index 1 doesn't match with c from index=2 that particular result is getting eliminated. But a need that particular result of a,b,c where the d value is not found.

DalJeanis · ‎05-09-2017

This adds an appendpipe subsearch into the mapped search to create a record if there are no results.

| map search="search index=index2 c=$c1$ ... other search terms... | appendpipe [| stats count as reccount | where reccount=0 | eval c=$c1$ ] | table c d | eval a=$a1$ | eval b=$b1$"

DalJeanis · ‎05-09-2017

 index=index1 ...other search terms... 
| rex field=whatfieldtoextractfrom "theRexToExtractC"
| table a b c

... wait.... if c needs to be extracted from BOTH sides with rexes, then map is not the way to go.

... crud... okay,.. I need you to post some non-confidential sample data into the question, showing what your C values look like, and what the original records look like on each side, and what regular expression you are using to extract c.

The efficiency of any solution will be determined almost entirely by that.

I suspect that the best answer is going to look a lot like the other comment on this post, but
with some extra language to extract the C value and to format the returned values.

 index=index2 other search terms 
     [search index=index1 ...other search terms...
     | rex to extract c 
     | table a b c 
     | outputcsv myvalues.csv 
     | table c
     | format ...some formatting commands...]
 | rex to extract c 
 | table c d
 | join c [ | inputcsv myvalues.csv   | table a b c ]
 | stats values(*) as * by c

extracting the c value from the index2 side in this example is because the subsearch in brackets is not returning (C=value1 OR c=value2 OR...) but instead returning just ("value1" OR "value2" OR...) and so on, so there is more massaging to be done.

ankithreddy777 · ‎05-09-2017

If I use append pipe I am getting
"[map]: Search auto-finalized after time limit (60 seconds) reached." error.

muebel · ‎05-08-2017

Hi ankithreddy777, splunk has a join command, with documentation available here : http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Join

Join may have performance issues, and has some innate limitations. To get around this (if needed), you might also find some good ideas in Nick Mealy's virtual conf from March here : http://wiki.splunk.com/Virtual_.conf

Best practices around grouping and aggregating data from different search results

Please let me know if this answers your question! 😄

Join data from two indexes

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Join the Conversation

Join data from two indexes

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey