Solved: Join two indexes based on substring match

ved08514 · ‎10-27-2020

Hi,

I am struggling with joining two indexes based on substring match.

I have following indexes :

index1 :
having following fields

PROTOCOL,DIRECTION,FILENAME,DIRECTORYNAME

index2:

having following fields
APPID,CUSTOMERID,FILEPATTERN,DIRECTORYNAME

I want to join above indexes based on following condition
1. FILEPATTERN is substring of FILENAME

2. DIRECTORYNAME in index1 = DIRECTORYNAME in index 2.

and display output with following fields
PROTOCOL,DIRECTION,APPID,CUSTOMERID,FILEPATTERN,DIRECTORYNAME

Thanks in anticipation

Regards

Nikhil

ITWhisperer · ‎10-27-2020

@ved08514

Here is a run anywhere example which hopefully you can adapt for your needs:

| makeresults | eval events="app=123,cust=abc,pattern=xyz,dir=abc
app=456,cust=abc,pattern=rst,dir=abc
app=123,cust=def,pattern=xyz,dir=def
app=456,cust=def,pattern=rst,dir=def
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc"
| rex field=events max_match=0 "(?<event>.*)[\r\n]*" 
| mvexpand event
| eval _raw=event
| fields - _time events
| extract
| rename COMMENT as "made some run anywhere data"

| rename COMMENT as "concatenate fields from different indexes"
| eval nip=name.",".io.",".prot
| eval acp=app.",".cust.",".pattern
| rename COMMENT as "list data from both indexes by directory"
| stats list(acp) as acp list(nip) as nip by dir
| rename COMMENT as "expand data from second index"
| mvexpand acp
| rename COMMENT as "extract original fields"
| rex field=acp "(?<app>[^,]+),(?<cust>[^,]+),(?<pattern>[^,]+)"
| fields - acp
| rename COMMENT as "copy data from first index"
| eval match=nip
| rename COMMENT as "determine if name matches pattern"
| eval match=mvmap(match,if(match(mvindex(split(match,","),0),pattern),1,0))
| rename COMMENT as "zip match result with first index data"
| eval nipmatch=mvzip(match,nip)
| rename COMMENT as "filter matches"
| eval nip=mvmap(nipmatch,if(mvindex(split(nipmatch,","),0)="1",nipmatch,null))
| fields - nipmatch match
| rename COMMENT as "expand matching data from first index"
| mvexpand nip
| rename COMMENT as "extract original data (ignoring match flag)"
| rex field=nip ",(?<name>[^,]+),(?<io>[^,]+),(?<prot>.+)"
| fields - nip
| rename COMMENT as "table data"
| table prot io app cust pattern dir

Assumptions include you search both indexes at once, e.g. index=idx1 OR index=idx2; using comma delimiters doesn't present a problem; and, you are using at least 8.0.0 as mvmap isn't supported in earlier versions. There may be a more efficient way to do this, but I will leave that to others to come up with.

View solution in original post

ITWhisperer · ‎10-27-2020

Are there any fields or combination of fields which are unique within each index and/or across both?

ved08514 · ‎10-27-2020

Hi @ITWhisperer

FILEPATTERN and DIRECTORYNAME combination in index2 is unique (i.e. its like composite key for index2).

Whereas for index1 there are no unique combination.

ITWhisperer · ‎10-27-2020

@ved08514

Here is a run anywhere example which hopefully you can adapt for your needs:

| makeresults | eval events="app=123,cust=abc,pattern=xyz,dir=abc
app=456,cust=abc,pattern=rst,dir=abc
app=123,cust=def,pattern=xyz,dir=def
app=456,cust=def,pattern=rst,dir=def
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc"
| rex field=events max_match=0 "(?<event>.*)[\r\n]*" 
| mvexpand event
| eval _raw=event
| fields - _time events
| extract
| rename COMMENT as "made some run anywhere data"

| rename COMMENT as "concatenate fields from different indexes"
| eval nip=name.",".io.",".prot
| eval acp=app.",".cust.",".pattern
| rename COMMENT as "list data from both indexes by directory"
| stats list(acp) as acp list(nip) as nip by dir
| rename COMMENT as "expand data from second index"
| mvexpand acp
| rename COMMENT as "extract original fields"
| rex field=acp "(?<app>[^,]+),(?<cust>[^,]+),(?<pattern>[^,]+)"
| fields - acp
| rename COMMENT as "copy data from first index"
| eval match=nip
| rename COMMENT as "determine if name matches pattern"
| eval match=mvmap(match,if(match(mvindex(split(match,","),0),pattern),1,0))
| rename COMMENT as "zip match result with first index data"
| eval nipmatch=mvzip(match,nip)
| rename COMMENT as "filter matches"
| eval nip=mvmap(nipmatch,if(mvindex(split(nipmatch,","),0)="1",nipmatch,null))
| fields - nipmatch match
| rename COMMENT as "expand matching data from first index"
| mvexpand nip
| rename COMMENT as "extract original data (ignoring match flag)"
| rex field=nip ",(?<name>[^,]+),(?<io>[^,]+),(?<prot>.+)"
| fields - nip
| rename COMMENT as "table data"
| table prot io app cust pattern dir

Assumptions include you search both indexes at once, e.g. index=idx1 OR index=idx2; using comma delimiters doesn't present a problem; and, you are using at least 8.0.0 as mvmap isn't supported in earlier versions. There may be a more efficient way to do this, but I will leave that to others to come up with.

ved08514 · ‎10-28-2020

Hi @ITWhisperer

Thanks a lot for solution.

Actually field directory name in index2 is "dir2". I tried to rename it to "dir" but its overwriting all the values in index1.
It works fine if both the index has "dir" as field name.

ITWhisperer · ‎10-28-2020

Insert a coalesce before the stats

| rename COMMENT as "list data from both indexes by directory"
| eval dir=coalesce(dir,dir2)
| stats list(acp) as acp list(nip) as nip by dir

ved08514 · ‎10-29-2020

@ITWhisperer

I am facing one issue here. even if my index is having 600+ records i am getting only 200 records in output. I tried to increase number of records in index but still it showing 200 records.

ITWhisperer · ‎10-29-2020

@ved08514 Do you get any useful information about what might be causing this from the job inspection? Is your time period correct? Can you try modifying the query to target the missing records to see if you can retrieve them in other ways?

ved08514 · ‎10-29-2020

@ITWhisperer

I tried to modify the runanywhere data in original query you provided. There also I am seeing the same issue.
I added more records in index2 like 400 but I am seeing less data.

Attached full query.

ved08514 · ‎10-29-2020

@ITWhisperer
I just saw below error in Job button dropdown
stats' command: limit for values of field 'nip' reached. Some values may have been truncated or ignored.

ITWhisperer · ‎10-29-2020

Update your limits.conf configuration to increase the size of the list stats can return

https://docs.splunk.com/Documentation/Splunk/latest/Admin/limitsconf#.5Bstats.7Csistats.5D

ved08514 · ‎10-29-2020

@ITWhisperer
Thanks. That worked.

Join two indexes based on substring match

join

lookup

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?