Solved: Re: How to replace join in below query?

priyastalin · ‎12-15-2020

Hi,

could you please help me in replacing the join in below query?

index=168347-np [ | `last_np_sourcetype("index=168347-np","hardware")`] (physicalType=*) | fields physicalElementId deviceId
| join deviceId [ search index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681 | fields deviceId ]
| stats dc(physicalElementId) as Devices

Thanks

bowesmana · ‎12-15-2020

@priyastalin

The general idea behind removing a join from a search is to combine the two search fragments to a single search separated by OR, e.g.

(index=168347-np [ | `last_np_sourcetype("index=168347-np","hardware")`] (physicalType=*)) OR
(index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681 )
...
then do stuff with the two data sets using eval/stats/...

however, in your example, if you are just using the join to filter on groupId as @nickhills suggests, there are other options, but to achieve a similar outcome to the filtering join you could do

(index=168347-np [ | `last_np_sourcetype("index=168347-np","hardware")`] (physicalType=*)) OR
(index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681 )
| stats values(groupId) as groupIds values(physicalElementId) as physicalElementIds by deviceId
| where mvcount(groupIds)=1
| stats dc(physicalElementIds) as Devices

which would search both data sets, then aggregate the two data sets retaining the groupIds by deviceId and the subsequent where clause performs the filter, as all deviceIds that have no corresponding groupId will be removed (effective inner join).

Then your final stats will just get the unique physicalElementId count.

Remember in Splunk that subsearches often come with limitations and performance implications and it is often useful to consider how a subsearch can be replaced with an alternative construct, like others have suggested here. It will generally come down to your data.

Hope this helps

View solution in original post

bowesmana · ‎12-15-2020

@priyastalin

The general idea behind removing a join from a search is to combine the two search fragments to a single search separated by OR, e.g.

(index=168347-np [ | `last_np_sourcetype("index=168347-np","hardware")`] (physicalType=*)) OR
(index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681 )
...
then do stuff with the two data sets using eval/stats/...

however, in your example, if you are just using the join to filter on groupId as @nickhills suggests, there are other options, but to achieve a similar outcome to the filtering join you could do

(index=168347-np [ | `last_np_sourcetype("index=168347-np","hardware")`] (physicalType=*)) OR
(index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681 )
| stats values(groupId) as groupIds values(physicalElementId) as physicalElementIds by deviceId
| where mvcount(groupIds)=1
| stats dc(physicalElementIds) as Devices

which would search both data sets, then aggregate the two data sets retaining the groupIds by deviceId and the subsequent where clause performs the filter, as all deviceIds that have no corresponding groupId will be removed (effective inner join).

Then your final stats will just get the unique physicalElementId count.

Remember in Splunk that subsearches often come with limitations and performance implications and it is often useful to consider how a subsearch can be replaced with an alternative construct, like others have suggested here. It will generally come down to your data.

Hope this helps

richgalloway · ‎12-15-2020

What's the point of this query? It's joining a single field to events that already have that field so what value is added?

---
If this reply helps you, Karma would be appreciated.

nickhills · ‎12-15-2020

I think ( @priyastalin correct me if wrong) it looks like this is just so you can filter on the groupId?

I'm not sure if seeing the contents of the macro would help to understand if its doing a simple filter or if there is something more complex going on to get the set of results.

If this is something you run frequently, I think a lookup might be a far simpler (and faster) approach.

If my comment helps, please give it a thumbs up!

priyastalin · ‎12-15-2020

Content of first query

"index=168347-np [ | `last_np_sourcetype("index=168347-np","hardware")`] (physicalType=*) | fields physicalElementId deviceId "

This query has 39718 events similarly like below

"chassisName: 10.241.241.22
   cpyKey: 168347
   deviceId: 19206245
   deviceName: 10.241.241.22
   hwRev: null
   installedFlash: null
   installedMemory: null
   pcb: null
   pcbRev: null
   physicalElementId: 388725541
   physicalSubType: null
   physicalType: Power Supply
   productFamily: Catalyst 2K/3K Series Power Supplies
   productId: PWR-C1-1100WAC
   productType: Power Supplies
   serialNumber: DTN2246V1S4
   serialNumberStatus: UNKNOWN
   slot: PWRSPPLY2
   swVersion: null
   tan: null
   tanRev: null
   timeStamp: 2020-12-14T13:35:46"

Content of second query:

"index=168347-np [| `last_np_sourcetype( "index=168347-np", "group_members")` ] groupId=290681 | fields deviceId"

This query has 25 events similarly like below

{ [-]
   deviceId: 16681845
   groupId: 290681
   groupName: ASR 1K Routers
   timeStamp: 2020-12-14T13:35:27
}

Final output is:

210

nickhills · ‎12-15-2020

It might help if you can also post the contents of the macro last_np_sourcetype()

If my comment helps, please give it a thumbs up!

How to replace join in below query?

join

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Agentic with Splunk Lantern: Connect to Cisco Cloud Control, Transform ...

July Community Events: Master ITSI 5.0 & Automate Splunk

New Release of Federated Search: Bringing Splunk Analytics to More of Your Data

Join the Conversation