All Apps and Add-ons

Join data from two indexes

ankithreddy777
Contributor

I have some data on index1 and some other data on Index2.
Let us say that I have fields a,b &c to be derived from index1 and fields c & d to be derived from index2.

I have scenario that for some c in index1 there is a respective value d in index 2. How to combine both indexes to get final result as
a,b,c &d with top performance.

On index1 I will search for 30 search results every time. so I will get 30 results for field C that need to be searched in Index2 and combine.

Tags (1)

chimell
Motivator

Hi ankithreddy777
Try this search

|set union[search index=index1|fields a b c][search index=index2|fields c d]
0 Karma

ankithreddy777
Contributor

I just need the d values only where c matches

0 Karma

briancronrath
Contributor

is it the same 30 second time range for both indexes you need to query? If so I don't think any subsearch is necessary, just search both indexes in your base search, something like:
(index=A OR index=B) |
Then can do fill some null values and enable a stats call where you group by field c. If the field is named differently in each index you'll also need to add in an eval that creates a singular field to use for the grouping. So something like this:

 (index=A OR index=B) | fillnull value="" a,b,c,d | stats sum(a) sum(b) sum(d) by c

Can change the sum to be whatever function you need for your particular stats call.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

The most efficient answer is going to depend on the characteristics of your two data sources. If the data from the left part of the search returns a small number of values that can then be looked up on the right, then a map might be the right answer. On the other hand, if the right side contains a limited number of categorical variables-- say zip codes, or roles -- then maybe a lookup is the right answer. A join is usually efficient when both sides are relatively large, and most of the records in at least one side are going to be used.

Here's a couple of places with extensive answers on this. Read the section on "splunk stew" - that's usually the most efficient way, if it matches your data and the above cases are not clearly in evidence.

https://answers.splunk.com/answers/524250/how-to-search-for-matches-in-two-different-searche.html
https://answers.splunk.com/answers/509697/how-to-combine-my-2-reports.html

ankithreddy777
Contributor

Hi, I will have only 30 results on index 1 , I need to map field C in index 1 with field C in index 2 (which contains large set of data) . Only few C matches in index B. I should display all results in index1 and matching results from index B as |table a,b,c,d.

Could you please give me a sample query for this scenario

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

This might also work efficiently for your needs...

index=index2 other search terms 
    [search index=index1 ...other search terms... 
    | table a b c 
    | outputcsv myvalues.csv 
    | table c]
| table c d
| append [ | inputcsv myvalues.csv]
| stats values(*) as * by c 
0 Karma

DalJeanis
SplunkTrust
SplunkTrust

I can't be absolutely sure that this is the most efficient, without more details, but here goes an example of the map command.

index=index1 ...other search terms... 
| table a b c 
| rename a as a1 b as b1 c as c1
| map search="search index=index2 c=$c1$ ... other search terms... | table c d | eval a=$a1$ | eval b=$b1$" maxsearches=30   

...or, if the value of a variable, for example a, can contain spaces in the middle of it...

| map search="search index=index2 c=$c1$ ... other search terms... | table c d | eval a=\"$a1$\" | eval b=$b1$" maxsearches=30 

The renames in my example are not strictly necessary, but I wanted you to be able to see where each field value was coming from, so that you knew that $c1$ was the value of the c variable that got returned from the first search.

0 Karma

ankithreddy777
Contributor

Thank you so much DalJeanis. Great Answers. I just have one question, what if my C value is not an extracted field. i.e I need to use rex to extract it. How to use this using map command.

0 Karma

ankithreddy777
Contributor

Hi DalJeanis, If I implement the above criteria, My results set is just limited to number of matches of c.

I have used
index=index1 ...other search terms...
| table a b c
| rename a as a1 b as b1 c as c1
| map search="search index=index2 c=$c1$ ... other search terms... | table c d | eval a=$a1$ | eval b=$b1$" maxsearches=30 |table a , b, c, d

If there are 3 matches, my final result set has only three results. If one c from index 1 doesn't match with c from index=2 that particular result is getting eliminated. But a need that particular result of a,b,c where the d value is not found.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

This adds an appendpipe subsearch into the mapped search to create a record if there are no results.

| map search="search index=index2 c=$c1$ ... other search terms... | appendpipe [| stats count as reccount | where reccount=0 | eval c=$c1$ ] | table c d | eval a=$a1$ | eval b=$b1$" 
0 Karma

DalJeanis
SplunkTrust
SplunkTrust
 index=index1 ...other search terms... 
| rex field=whatfieldtoextractfrom "theRexToExtractC"
| table a b c

... wait.... if c needs to be extracted from BOTH sides with rexes, then map is not the way to go.

... crud... okay,.. I need you to post some non-confidential sample data into the question, showing what your C values look like, and what the original records look like on each side, and what regular expression you are using to extract c.

The efficiency of any solution will be determined almost entirely by that.

I suspect that the best answer is going to look a lot like the other comment on this post, but
with some extra language to extract the C value and to format the returned values.

 index=index2 other search terms 
     [search index=index1 ...other search terms...
     | rex to extract c 
     | table a b c 
     | outputcsv myvalues.csv 
     | table c
     | format ...some formatting commands...]
 | rex to extract c 
 | table c d
 | join c [ | inputcsv myvalues.csv   | table a b c ]
 | stats values(*) as * by c 

extracting the c value from the index2 side in this example is because the subsearch in brackets is not returning (C=value1 OR c=value2 OR...) but instead returning just ("value1" OR "value2" OR...) and so on, so there is more massaging to be done.

0 Karma

ankithreddy777
Contributor

If I use append pipe I am getting
"[map]: Search auto-finalized after time limit (60 seconds) reached." error.

0 Karma

muebel
SplunkTrust
SplunkTrust

Hi ankithreddy777, splunk has a join command, with documentation available here : http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Join

Join may have performance issues, and has some innate limitations. To get around this (if needed), you might also find some good ideas in Nick Mealy's virtual conf from March here : http://wiki.splunk.com/Virtual_.conf

Best practices around grouping and aggregating data from different search results

Please let me know if this answers your question! 😄

Get Updates on the Splunk Community!

The Splunk Success Framework: Your Guide to Successful Splunk Implementations

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...