Hi,
I am struggling with joining two indexes based on substring match.
I have following indexes :
index1 :
having following fields
PROTOCOL,DIRECTION,FILENAME,DIRECTORYNAME
index2:
having following fields
APPID,CUSTOMERID,FILEPATTERN,DIRECTORYNAME
I want to join above indexes based on following condition
1. FILEPATTERN is substring of FILENAME
2. DIRECTORYNAME in index1 = DIRECTORYNAME in index 2.
and display output with following fields
PROTOCOL,DIRECTION,APPID,CUSTOMERID,FILEPATTERN,DIRECTORYNAME
Thanks in anticipation
Regards
Nikhil
Here is a run anywhere example which hopefully you can adapt for your needs:
| makeresults | eval events="app=123,cust=abc,pattern=xyz,dir=abc
app=456,cust=abc,pattern=rst,dir=abc
app=123,cust=def,pattern=xyz,dir=def
app=456,cust=def,pattern=rst,dir=def
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc"
| rex field=events max_match=0 "(?<event>.*)[\r\n]*"
| mvexpand event
| eval _raw=event
| fields - _time events
| extract
| rename COMMENT as "made some run anywhere data"
| rename COMMENT as "concatenate fields from different indexes"
| eval nip=name.",".io.",".prot
| eval acp=app.",".cust.",".pattern
| rename COMMENT as "list data from both indexes by directory"
| stats list(acp) as acp list(nip) as nip by dir
| rename COMMENT as "expand data from second index"
| mvexpand acp
| rename COMMENT as "extract original fields"
| rex field=acp "(?<app>[^,]+),(?<cust>[^,]+),(?<pattern>[^,]+)"
| fields - acp
| rename COMMENT as "copy data from first index"
| eval match=nip
| rename COMMENT as "determine if name matches pattern"
| eval match=mvmap(match,if(match(mvindex(split(match,","),0),pattern),1,0))
| rename COMMENT as "zip match result with first index data"
| eval nipmatch=mvzip(match,nip)
| rename COMMENT as "filter matches"
| eval nip=mvmap(nipmatch,if(mvindex(split(nipmatch,","),0)="1",nipmatch,null))
| fields - nipmatch match
| rename COMMENT as "expand matching data from first index"
| mvexpand nip
| rename COMMENT as "extract original data (ignoring match flag)"
| rex field=nip ",(?<name>[^,]+),(?<io>[^,]+),(?<prot>.+)"
| fields - nip
| rename COMMENT as "table data"
| table prot io app cust pattern dir
Assumptions include you search both indexes at once, e.g. index=idx1 OR index=idx2; using comma delimiters doesn't present a problem; and, you are using at least 8.0.0 as mvmap isn't supported in earlier versions. There may be a more efficient way to do this, but I will leave that to others to come up with.
Are there any fields or combination of fields which are unique within each index and/or across both?
Hi @ITWhisperer
FILEPATTERN and DIRECTORYNAME combination in index2 is unique (i.e. its like composite key for index2).
Whereas for index1 there are no unique combination.
Here is a run anywhere example which hopefully you can adapt for your needs:
| makeresults | eval events="app=123,cust=abc,pattern=xyz,dir=abc
app=456,cust=abc,pattern=rst,dir=abc
app=123,cust=def,pattern=xyz,dir=def
app=456,cust=def,pattern=rst,dir=def
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=in,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=ghi,io=out,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=in,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=jkl,io=out,name=wxyz,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=in,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=ghi,io=out,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=in,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc
prot=jkl,io=out,name=rstu,dir=abc"
| rex field=events max_match=0 "(?<event>.*)[\r\n]*"
| mvexpand event
| eval _raw=event
| fields - _time events
| extract
| rename COMMENT as "made some run anywhere data"
| rename COMMENT as "concatenate fields from different indexes"
| eval nip=name.",".io.",".prot
| eval acp=app.",".cust.",".pattern
| rename COMMENT as "list data from both indexes by directory"
| stats list(acp) as acp list(nip) as nip by dir
| rename COMMENT as "expand data from second index"
| mvexpand acp
| rename COMMENT as "extract original fields"
| rex field=acp "(?<app>[^,]+),(?<cust>[^,]+),(?<pattern>[^,]+)"
| fields - acp
| rename COMMENT as "copy data from first index"
| eval match=nip
| rename COMMENT as "determine if name matches pattern"
| eval match=mvmap(match,if(match(mvindex(split(match,","),0),pattern),1,0))
| rename COMMENT as "zip match result with first index data"
| eval nipmatch=mvzip(match,nip)
| rename COMMENT as "filter matches"
| eval nip=mvmap(nipmatch,if(mvindex(split(nipmatch,","),0)="1",nipmatch,null))
| fields - nipmatch match
| rename COMMENT as "expand matching data from first index"
| mvexpand nip
| rename COMMENT as "extract original data (ignoring match flag)"
| rex field=nip ",(?<name>[^,]+),(?<io>[^,]+),(?<prot>.+)"
| fields - nip
| rename COMMENT as "table data"
| table prot io app cust pattern dir
Assumptions include you search both indexes at once, e.g. index=idx1 OR index=idx2; using comma delimiters doesn't present a problem; and, you are using at least 8.0.0 as mvmap isn't supported in earlier versions. There may be a more efficient way to do this, but I will leave that to others to come up with.
Hi @ITWhisperer
Thanks a lot for solution.
Actually field directory name in index2 is "dir2". I tried to rename it to "dir" but its overwriting all the values in index1.
It works fine if both the index has "dir" as field name.
Insert a coalesce before the stats
| rename COMMENT as "list data from both indexes by directory"
| eval dir=coalesce(dir,dir2)
| stats list(acp) as acp list(nip) as nip by dir
@ITWhisperer
I am facing one issue here. even if my index is having 600+ records i am getting only 200 records in output. I tried to increase number of records in index but still it showing 200 records.
@ved08514 Do you get any useful information about what might be causing this from the job inspection? Is your time period correct? Can you try modifying the query to target the missing records to see if you can retrieve them in other ways?
@ITWhisperer
I tried to modify the runanywhere data in original query you provided. There also I am seeing the same issue.
I added more records in index2 like 400 but I am seeing less data.
Attached full query.
@ITWhisperer
I just saw below error in Job button dropdown
stats' command: limit for values of field 'nip' reached. Some values may have been truncated or ignored.
Update your limits.conf configuration to increase the size of the list stats can return
https://docs.splunk.com/Documentation/Splunk/latest/Admin/limitsconf#.5Bstats.7Csistats.5D
@ITWhisperer
Thanks. That worked.