Splunk Search

How to get full results from 2 queries joined together?

nkhanna
Engager

I am not getting full data in output when combining 2 queries using join.  When I run first query individually, I get 143 results after dedup but upon joining, I am getting 71 results only. Whereas I know that for remaining records, data is available when running 2nd query individually.

How can I fix this?

I am searching for records where pods got claimed and then searching for connected time using subsearch and need output of all columns in tabular format.

 

index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true"
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=strftime(_time,"%m/%d/%Y %H:%M:%S")
| table winClaimTime, accountNumber, routerMac, node_id, locationId, status, techMobile
| dedup routerMac, node_id sortby winClaimTime
| join type=inner node_id
[ search index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=strftime(_time,"%m/%d/%Y %H:%M:%S")
| table nodeFirstConnectedTime, node_id
| dedup node_id sortby nodeFirstConnectedTime]
| table winClaimTime, accountNumber, routerMac, node_id, locationId, status, techMobile, nodeFirstConnectedTime
Labels (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

(index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true") OR (index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X)
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=if(index="aws-cpe-scl",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=if(index="aws-cpe-osc",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| stats first(winClaimTime) as claimTime first(routerMac) as routerMac first(nodeFirstConnectedTime) as connectionTime by node_id

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Ideally, don't use join!

You could try searching both indexes in your initial search and then "join" the data from the events using stats.

nkhanna
Engager

Can you please show an example how to do it?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please provide (anonymised) examples of your events in code blocks </> to preserve formatting information and a description of what you are trying to get out of them, for example, why are you using dedup? Also, what do your expected results look like (for the example events you have provided)?

0 Karma

nkhanna
Engager

First log is:
[2024-07-17T17:45:41.644Z] [INFO] [txnId=cz81j0kykru05f66jgsa9-1721238338661] [methodPath=POST:/scl/v1/equipment/router/88de7cd25f27/claim/pods] [routerMac=88de7cd25f27] [accountNumber=8337130011080779] [userName=p2197160] [status=200] [techMobile=true] [x-trace-id=1720705801769194081009825476769596142f411b7] [responseJson={"claimed":[{"bluetoothMac":"6c:ff:ce:cb:65:29","boxSerialNumber":"SA8A5001E0CHTRP","certificates":{"clientAfter":"1970-01-01T00:00:00.000Z","clientBefore":"2028-01-22T23:23:39.000Z","rootAfter":"2018-01-22T23:23:39.000Z","rootBefore":"2031-11-29T17:15:43.000Z"},"claimKeyRequired":false,"customerId":"6697fb785f7345000a00acea","deployment":"tau-prod","ethernet1Mac":"6c:ff:ce:cb:65:25","ethernetMac":"6c:ff:ce:cb:65:24","firmwareVersion":"2.4.3-72-g65b961c-prod","groupIds":["5d273b4a7f65315f752896b5"],"kvConfigs":[],"id":"SA8A5001E0CHTRP","locationId":"6697fb785f7345000a00aceb","model":"MAPV1S","packId":"f0d9323a-45db-48b8-aef3-acc8345b2913","partNumber":"253865327","partnerId":"5d273b4a7f65315f752896b5","purchaseOrderNumber":"PONA","radioMac24":"6c:ff:ce:cb:65:26","radioMac50U":"6c:ff:ce:cb:65:28","radioMac50L":"6c:ff:ce:cb:65:27","residentialGateway":false,"serialNumber":"SA8A5001E0CHTRP","shardNumber":"fakeShard","shipDate":"04/05/2022","subscriptionRequired":false,"subscriptionRequiredTerm":null,"unclaimable":false,"vendor":{"factory":"I","manufacturer":"SGCPTSN","mfgDate":"2022/13","name":"Plume","partNumber":"PP203X"},"createdAt":"2022-04-21T19:37:14.089Z","updatedAt":"2024-07-17T17:45:41.214Z","_id":"SA8A5001E0","_version":"1.1.0","backhaulDhcpPoolIdx":2}]}]

Second log is:
[2024-07-17 17:46:38.345UTC] [INFO ] [ConnectionAgent] [sessionId=51f9a46c-6e41-40f2-8406-e3bfbe97f420] - Node SA8A5001E0 connected: model=PP203X, firmware=4.2.0-91-g5cf406-prod, ovs=???, opensync=3.2.6.0, ip=100.101.206.57, vendorName=Plume


So what I am doing is to fetch list of boxSerialNumber from log 1 and then check if log 2 is generated for same boxSerialNumber.
I am using dedup sortby command as I need the first event timestamp.

claimTimenode_idrouterMac
07/17/2024 11:44:16SA8A5001E0 88de7cd25f27

 

connectionTimenode_id
07/17/2024 11:44:54 SA8A5001E0

 

Final Result should be merge of these 2 tables based on node_id.

claimTimenode_idrouterMacconnectionTime
07/17/2024 11:44:16SA8A5001E0 88de7cd25f2707/17/2024 11:44:54

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

(index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true") OR (index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X)
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=if(index="aws-cpe-scl",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=if(index="aws-cpe-osc",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| stats first(winClaimTime) as claimTime first(routerMac) as routerMac first(nodeFirstConnectedTime) as connectionTime by node_id

nkhanna
Engager

Thanks a lot!! It worked. Great help.

0 Karma

fredclown
Builder

Yes, if it can be done with stats that would be best. It is the Splunky way to do it.

nkhanna
Engager

Can you please show an example how to do it?

0 Karma

fredclown
Builder

Have you tried using a left (also called outer) join vs and inner join. An inner join will only give you data where the node_id appears in both sets of data. A left join will give you all the results from the base search joined where it matches in the sub search.

0 Karma

nkhanna
Engager

Yes, I have tried join with type Inner, outer, and left. Nothing gives me accurate results.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...