Splunk Search

How to replace join in this below query?

priyastalin
Explorer

@dmarling 

Hi, 

I've replaced join in the below query and posted that query as well but I'm not getting proper output correct me if I made any mistake

Query with join:

"index=168347-np
[| `last_np_sourcetype("index=168347-np", "hw_eox")`] currentEoxMilestone=EoSCR OR currentEoxMilestone=LDoS (physicalType=*)
| fields deviceId, deviceName, physicalElementId, hwEoxId
| join deviceId [ search index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681 | fields deviceId ]
| fields deviceName, physicalElementId, hwEoxId
| dedup physicalElementId
| table deviceName, physicalElementId, hwEoxId
| stats count as Devices by hwEoxId
| stats sum(Devices) as Devices "

Replaced join from the above query:

"index=168347-np
([| `last_np_sourcetype("index=168347-np", "hw_eox")`] currentEoxMilestone=EoSCR OR currentEoxMilestone=LDoS (physicalType=*)) OR
([| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681)
| fields deviceId deviceName physicalElementId hwEoxId sourcetype
| stats values(sourcetype) as sourcetype values(deviceName) as deviceName values(physicalElementId) as physicalElementId values(hwEoxId) as hwEoxId by deviceId
| search sourcetype=hw_eox sourcetype=group_members
| table deviceName, physicalElementId, hwEoxId
| stats count as Devices by hwEoxId
| stats sum(Devices) as Devices"

 

Thanks

Labels (1)
0 Karma
1 Solution

dmarling
Builder

Hi @priyastalin,

I am not going to be able to go through every possible use case you run into.  I will provide you with a concept that should carry through on any use case with the data you've been working with. In general you want to wrap your different base search criteria with parenthesis and with OR between the groups of parenthesis's.  You need to identify the correlating field, which in all of of your use cases thus far have been deviceId.  You then need to create a stats that will pass through all of the fields you require using "latest(fieldname) as fieldname" and that passes in the values of the sourcetypes with "values(sourcetype) as sourcetype" to ensure you are getting data from both data sets.  You then add a search line after the states that ensures you are getting both sourcetypes.  Now you can do anything you would like to the resulting data as you normally would have with your join. 

Walking through these concepts with your latest example would be making your first line the two different data sets wrapped in parenthesis separated by OR:

index=167515-np (sourcetype=hardware physicalType=Chassis) OR (sourcetype=group_members groupId=288348)

You then define the fields that will be passed through:

| fields deviceId productType productId physicalType sourcetype

You then create a stats that provides the latest values for all fields of interest by the key field used for joining , which is deviceId, and that provides all values of the field that will let us know what source it's from, which is sourcetype:

| stats latest(productType) as productType latest(productId) as productId latest(physicalType) as physicalType values(sourcetype) as sourcetype by deviceId

Then you add a filter that ensures only data that appears in all of the selected data sources , which in this use case is using sourcetype, are being returned:

| search sourcetype=hardware sourcetype=group_members

You now have the fields you need from the two data sets and only data that exists in both data sets.

If this comment/answer was helpful, please up vote it. Thank you.

View solution in original post

priyastalin
Explorer

@bowesmana 

Hi Bowesmana,

Can you also help me in solving the query ?

 

Thanks.

0 Karma

dmarling
Builder

Hello,

Without knowing your underlying data it will be hard for me to know what is wrong.  The execution of the query is what I would recommend to remove a join.  I did notice something odd on the final two lines of your non-join query.  You are counting the rows by hwEoxId then summing that count.  It's possible you could be double counting rows when doing it by hwEoxId if there are multiple hwEoxId  per deviceId.  If that is the case you can correct that by doing this query instead:

index=168347-np 
    ([| `last_np_sourcetype("index=168347-np", "hw_eox")`] currentEoxMilestone=EoSCR OR currentEoxMilestone=LDoS (physicalType=*)) OR
    ([| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681)
| fields deviceId deviceName physicalElementId hwEoxId sourcetype
| stats values(deviceName) as deviceName values(physicalElementId) as physicalElementId values(sourcetype) as sourcetype values(hwEoxId) as hwEoxId by deviceId
| search sourcetype=hw_eox sourcetype=group_members hwEoxId=*
| stats count as Devices
If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

priyastalin
Explorer

Hi @dmarling 

Thanks for replying the below query which you shared it not working exactly.

I would like to give some inputs,

This is query with join:

" index=168347-np
[| `last_np_sourcetype("index=168347-np", "hw_eox")`] currentEoxMilestone=EoSCR OR currentEoxMilestone=LDoS (physicalType=*)
| fields deviceId, deviceName, physicalElementId, hwEoxId
| join deviceId [ search index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=309425 | fields deviceId ]
| fields deviceName, physicalElementId, hwEoxId
| dedup physicalElementId
| table deviceName, physicalElementId, hwEoxId
| stats count as Devices by hwEoxId
| stats sum(Devices) as Devices"

sample ouput of the  1stquery " index=168347-np
[| `last_np_sourcetype("index=168347-np", "hw_eox")`] currentEoxMilestone=EoSCR OR currentEoxMilestone=LDoS (physicalType=*)
| fields deviceId, deviceName, physicalElementId, hwEoxId"

cpyKey168347
   currentEoxMilestoneLDoS
   currentEoxMilestoneDate2018-01-31T00:00:00
   deviceId18468220
   deviceName10.24.13.240
   hwEoxId286609
   nextEoxMilestonenull
   nextEoxMilestoneDatenull
   physicalElementId362026445
   physicalTypePower Supply
   productIdC3K-PWR-265WAC
   timeStamp2020-12-15T12:57:03

Output of second query"index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=309425 | fields deviceId"

"deviceId:16642716
groupId:309425
groupName:Cisco 3900 ISR Routers
timeStamp:
2020-12-15T12:56:41"

Output of the query till dedup of physicalElementId  "index=168347-np
[| `last_np_sourcetype("index=168347-np", "hw_eox")`] currentEoxMilestone=EoSCR OR currentEoxMilestone=LDoS (physicalType=*)
| fields deviceId, deviceName, physicalElementId, hwEoxId
| join deviceId [ search index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681 | fields deviceId ]
| fields deviceName, physicalElementId, hwEoxId
| dedup physicalElementId"

'' ''cpyKey168347
currentEoxMilestone:LDoS
currentEoxMilestoneDate:2018-03-31T00:00:00
deviceId:15459204
deviceName:10.82.112.71
hwEoxId:266079
nextEoxMilestone:null
nextEoxMilestoneDate:null
physicalElementId:341757347
physicalType:Module
productId:SFP-GE-L
timeStamp:2020-12-15T12:57:02""

Also I have attached the table values as pdf  for ""| table deviceName, physicalElementId, hwEoxId""

 

Please help me in removing the join in the query at the top.

Thanks a lot

0 Karma

dmarling
Builder

Thank you for the details on the expected output of your data with examples.  Without seeing what the output that the query I proposed is doing it's difficult to know what is going on but I am going to assume that this data changes over time due to you using dedup and using values on the stats was causing multiple events to show up on each deviceId row.  This can be solved by simply using latest instead of values on those fields:

 

index=168347-np 
    ([| `last_np_sourcetype("index=168347-np", "hw_eox")`] currentEoxMilestone=EoSCR OR currentEoxMilestone=LDoS (physicalType=*)) OR
    ([| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681)
| fields deviceId deviceName physicalElementId hwEoxId sourcetype
| stats latest(deviceName) as deviceName latest(physicalElementId) as physicalElementId values(sourcetype) as sourcetype latest(hwEoxId) as hwEoxId latest(groupName) as groupName latest(groupId) as groupId by deviceId
| search sourcetype=hw_eox sourcetype=group_members
| stats count as Devices

 

 

If you ever need more fields you just add them to the stats line using latest so you always get the most current results for the deviceId.  Here's a run anywhere example demonstrating this using fake data:

 

| makeresults count=1
| eval data="cpyKey: 168347, currentEoxMilestone: LDoS, currentEoxMilestoneDate: 2018-01-31T00:00:00, deviceId: 18468220, deviceName: 10.24.13.240, hwEoxId: 286609, nextEoxMilestone: null, nextEoxMilestoneDate: null, physicalElementId: 362026445, physicalType: Power Supply, productId: C3K-PWR-265WAC, timeStamp: 2020-12-15T12:57:03
cpyKey: 168347, currentEoxMilestone: LDoS, currentEoxMilestoneDate: 2018-01-31T00:00:00, deviceId: 18468220, deviceName: 10.82.112.71, hwEoxId: 266079, nextEoxMilestone: null, nextEoxMilestoneDate: null, physicalElementId: 341757347, physicalType: Power Supply, productId: C3K-PWR-265WAC, timeStamp: 2020-12-15T13:57:03
deviceId:18468220, groupId:309425, groupName:Cisco 3900 ISR Routers, timeStamp:2020-12-15T12:56:41"
| makemv data tokenizer="(?<data>[^\n]+)"
| mvexpand data
| rex mode=sed field=data "s/, /\n/g"
| rex field=data "timeStamp: ?(?<timeStamp>[^\s]+)"
| eval _time=strptime(timeStamp, "%FT%T"), sourcetype=if(match(data, "cpyKey"), "hw_eox", "group_members")
| fields - timeStamp
| rex field=data "deviceId: ?(?<deviceId>\d+)"
| rex field=data "deviceName: (?<deviceName>[^\n]+)"
| rex field=data "physicalElementId: (?<physicalElementId>[^\n]+)"
| rex field=data "hwEoxId: (?<hwEoxId>[^\n]+)"
| rex field=data "groupId:(?<groupId>[^\n]+)"
| rex field=data "groupName:(?<groupName>[^\n]+)"
| rex field=data "cpyKey: (?<cpyKey>[^\n]+)"
| rex field=data "currentEoxMilestone: (?<currentEoxMilestone>[^\n]+)"
| rex field=data "currentEoxMilestoneDate: (?<currentEoxMilestoneDate>[^\n]+)"
| rex field=data "nextEoxMilestone: (?<nextEoxMilestone>[^\n]+)"
| rex field=data "nextEoxMilestoneDate: (?<nextEoxMilestoneDate>[^\n]+)"
| rex field=data "physicalType: (?<physicalType>[^\n]+)"
| rex field=data "productId: (?<productId>[^\n]+)"
| stats latest(deviceName) as deviceName latest(physicalElementId) as physicalElementId values(sourcetype) as sourcetype latest(hwEoxId) as hwEoxId latest(groupName) as groupName latest(groupId) as groupId latest(cpyKey) as cpyKey latest(currentEoxMilestone) as currentEoxMilestone latest(currentEoxMilestoneDate) as currentEoxMilestoneDate latest(nextEoxMilestone) as nextEoxMilestone latest(nextEoxMilestoneDate) as nextEoxMilestoneDate latest(physicalType) as physicalType latest(productId) as productId by deviceId
| search sourcetype=hw_eox sourcetype=group_members
| fields - sourcetype deviceId

 

 

You'll see in the above run anywhere example only the most current values are showing for each field except the sourcetypes because we want to ensure both sourcetypes are being represented in this data.

If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

priyastalin
Explorer

@dmarling 

Hi, 

Thanks for the help!!

There is another query with the join. I tried replacing with "OR" but I'm not getting exact output

"index=167515-np sourcetype=hardware
| fields deviceId, productType, productId, physicalType
| search physicalType=Chassis
| dedup deviceId
| join deviceId [ search index=167515-np [| `last_np_sourcetype( "index=167515-np", "group_members")` ] groupId=288348 | fields deviceId ]
| stats dc(productId) as PIDs by productType
| search productType=Routers
| table PIDs"

Could you please help me in solving this??

Thanks

0 Karma

dmarling
Builder

Hi @priyastalin,

I am not going to be able to go through every possible use case you run into.  I will provide you with a concept that should carry through on any use case with the data you've been working with. In general you want to wrap your different base search criteria with parenthesis and with OR between the groups of parenthesis's.  You need to identify the correlating field, which in all of of your use cases thus far have been deviceId.  You then need to create a stats that will pass through all of the fields you require using "latest(fieldname) as fieldname" and that passes in the values of the sourcetypes with "values(sourcetype) as sourcetype" to ensure you are getting data from both data sets.  You then add a search line after the states that ensures you are getting both sourcetypes.  Now you can do anything you would like to the resulting data as you normally would have with your join. 

Walking through these concepts with your latest example would be making your first line the two different data sets wrapped in parenthesis separated by OR:

index=167515-np (sourcetype=hardware physicalType=Chassis) OR (sourcetype=group_members groupId=288348)

You then define the fields that will be passed through:

| fields deviceId productType productId physicalType sourcetype

You then create a stats that provides the latest values for all fields of interest by the key field used for joining , which is deviceId, and that provides all values of the field that will let us know what source it's from, which is sourcetype:

| stats latest(productType) as productType latest(productId) as productId latest(physicalType) as physicalType values(sourcetype) as sourcetype by deviceId

Then you add a filter that ensures only data that appears in all of the selected data sources , which in this use case is using sourcetype, are being returned:

| search sourcetype=hardware sourcetype=group_members

You now have the fields you need from the two data sets and only data that exists in both data sets.

If this comment/answer was helpful, please up vote it. Thank you.

View solution in original post

priyastalin
Explorer

@dmarling 

Hi, Any update on the above query?

Tags (1)
0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!