Splunk Search

Join two indexes based on substring match

ved08514
Explorer

Hi,

I am struggling with joining two indexes based on substring match.

I have following indexes :

index1 :
having following fields

PROTOCOL,DIRECTION,FILENAME,DIRECTORYNAME

 

index2:

having following fields
APPID,CUSTOMERID,FILEPATTERN,DIRECTORYNAME

I want to join above indexes based on following condition 
1. FILEPATTERN is substring of FILENAME

2. DIRECTORYNAME in index1 = DIRECTORYNAME in index 2.

and display output with following fields
PROTOCOL,DIRECTION,APPID,CUSTOMERID,FILEPATTERN,DIRECTORYNAME

Thanks in anticipation

Regards

Nikhil 

Labels (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

@ved08514 

Here is a run anywhere example which hopefully you can adapt for your needs:

| makeresults | eval events="app=123,cust=abc,pattern=xyz,dir=abc
app=456,cust=abc,pattern=rst,dir=abc
app=123,cust=def,pattern=xyz,dir=def
app=456,cust=def,pattern=rst,dir=def
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc"
| rex field=events max_match=0 "(?<event>.*)[\r\n]*" 
| mvexpand event
| eval _raw=event
| fields - _time events
| extract
| rename COMMENT as "made some run anywhere data"

| rename COMMENT as "concatenate fields from different indexes"
| eval nip=name.",".io.",".prot
| eval acp=app.",".cust.",".pattern
| rename COMMENT as "list data from both indexes by directory"
| stats list(acp) as acp list(nip) as nip by dir
| rename COMMENT as "expand data from second index"
| mvexpand acp
| rename COMMENT as "extract original fields"
| rex field=acp "(?<app>[^,]+),(?<cust>[^,]+),(?<pattern>[^,]+)"
| fields - acp
| rename COMMENT as "copy data from first index"
| eval match=nip
| rename COMMENT as "determine if name matches pattern"
| eval match=mvmap(match,if(match(mvindex(split(match,","),0),pattern),1,0))
| rename COMMENT as "zip match result with first index data"
| eval nipmatch=mvzip(match,nip)
| rename COMMENT as "filter matches"
| eval nip=mvmap(nipmatch,if(mvindex(split(nipmatch,","),0)="1",nipmatch,null))
| fields - nipmatch match
| rename COMMENT as "expand matching data from first index"
| mvexpand nip
| rename COMMENT as "extract original data (ignoring match flag)"
| rex field=nip ",(?<name>[^,]+),(?<io>[^,]+),(?<prot>.+)"
| fields - nip
| rename COMMENT as "table data"
| table prot io app cust pattern dir

Assumptions include you search both indexes at once, e.g. index=idx1 OR index=idx2; using comma delimiters doesn't present a problem; and, you are using at least 8.0.0 as mvmap isn't supported in earlier versions. There may be a more efficient way to do this, but I will leave that to others to come up with.

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Are there any fields or combination of fields which are unique within each index and/or across both?

0 Karma

ved08514
Explorer

Hi @ITWhisperer 

FILEPATTERN  and DIRECTORYNAME  combination in index2 is unique (i.e. its like composite key for index2). 

Whereas for index1 there are no unique combination.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

@ved08514 

Here is a run anywhere example which hopefully you can adapt for your needs:

| makeresults | eval events="app=123,cust=abc,pattern=xyz,dir=abc
app=456,cust=abc,pattern=rst,dir=abc
app=123,cust=def,pattern=xyz,dir=def
app=456,cust=def,pattern=rst,dir=def
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc"
| rex field=events max_match=0 "(?<event>.*)[\r\n]*" 
| mvexpand event
| eval _raw=event
| fields - _time events
| extract
| rename COMMENT as "made some run anywhere data"

| rename COMMENT as "concatenate fields from different indexes"
| eval nip=name.",".io.",".prot
| eval acp=app.",".cust.",".pattern
| rename COMMENT as "list data from both indexes by directory"
| stats list(acp) as acp list(nip) as nip by dir
| rename COMMENT as "expand data from second index"
| mvexpand acp
| rename COMMENT as "extract original fields"
| rex field=acp "(?<app>[^,]+),(?<cust>[^,]+),(?<pattern>[^,]+)"
| fields - acp
| rename COMMENT as "copy data from first index"
| eval match=nip
| rename COMMENT as "determine if name matches pattern"
| eval match=mvmap(match,if(match(mvindex(split(match,","),0),pattern),1,0))
| rename COMMENT as "zip match result with first index data"
| eval nipmatch=mvzip(match,nip)
| rename COMMENT as "filter matches"
| eval nip=mvmap(nipmatch,if(mvindex(split(nipmatch,","),0)="1",nipmatch,null))
| fields - nipmatch match
| rename COMMENT as "expand matching data from first index"
| mvexpand nip
| rename COMMENT as "extract original data (ignoring match flag)"
| rex field=nip ",(?<name>[^,]+),(?<io>[^,]+),(?<prot>.+)"
| fields - nip
| rename COMMENT as "table data"
| table prot io app cust pattern dir

Assumptions include you search both indexes at once, e.g. index=idx1 OR index=idx2; using comma delimiters doesn't present a problem; and, you are using at least 8.0.0 as mvmap isn't supported in earlier versions. There may be a more efficient way to do this, but I will leave that to others to come up with.

ved08514
Explorer

Hi @ITWhisperer

Thanks a lot for solution.

Actually field directory name in index2 is "dir2". I tried to rename it to "dir" but its overwriting all the values in index1.
It works fine if both the index has "dir" as field name.


 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Insert a coalesce before the stats

| rename COMMENT as "list data from both indexes by directory"
| eval dir=coalesce(dir,dir2)
| stats list(acp) as acp list(nip) as nip by dir

ved08514
Explorer

@ITWhisperer 

I am facing one issue here. even if my index is having 600+ records i am getting only 200 records in output. I tried to increase number of records in index but still it showing 200 records.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

@ved08514 Do you get any useful information about what might be causing this from the job inspection? Is your time period correct? Can you try modifying the query to target the missing records to see if you can retrieve them in other ways?

0 Karma

ved08514
Explorer

@ITWhisperer 

I tried to modify the runanywhere data in original query you provided. There also I am seeing the same issue.
I added more records in index2 like 400 but I am seeing less data.

Attached full query.

0 Karma

ved08514
Explorer

@ITWhisperer 
I just saw below error in Job button dropdown
stats' command: limit for values of field 'nip' reached. Some values may have been truncated or ignored.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Update your limits.conf configuration to increase the size of the list stats can return

https://docs.splunk.com/Documentation/Splunk/latest/Admin/limitsconf#.5Bstats.7Csistats.5D

 

ved08514
Explorer

@ITWhisperer 
Thanks. That worked.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...