Splunk Search

How to get full results from 2 queries joined together?

nkhanna
Engager

I am not getting full data in output when combining 2 queries using join.  When I run first query individually, I get 143 results after dedup but upon joining, I am getting 71 results only. Whereas I know that for remaining records, data is available when running 2nd query individually.

How can I fix this?

I am searching for records where pods got claimed and then searching for connected time using subsearch and need output of all columns in tabular format.

 

index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true"
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=strftime(_time,"%m/%d/%Y %H:%M:%S")
| table winClaimTime, accountNumber, routerMac, node_id, locationId, status, techMobile
| dedup routerMac, node_id sortby winClaimTime
| join type=inner node_id
[ search index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=strftime(_time,"%m/%d/%Y %H:%M:%S")
| table nodeFirstConnectedTime, node_id
| dedup node_id sortby nodeFirstConnectedTime]
| table winClaimTime, accountNumber, routerMac, node_id, locationId, status, techMobile, nodeFirstConnectedTime
Labels (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

(index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true") OR (index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X)
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=if(index="aws-cpe-scl",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=if(index="aws-cpe-osc",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| stats first(winClaimTime) as claimTime first(routerMac) as routerMac first(nodeFirstConnectedTime) as connectionTime by node_id

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Ideally, don't use join!

You could try searching both indexes in your initial search and then "join" the data from the events using stats.

nkhanna
Engager

Can you please show an example how to do it?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please provide (anonymised) examples of your events in code blocks </> to preserve formatting information and a description of what you are trying to get out of them, for example, why are you using dedup? Also, what do your expected results look like (for the example events you have provided)?

0 Karma

nkhanna
Engager

First log is:
[2024-07-17T17:45:41.644Z] [INFO] [txnId=cz81j0kykru05f66jgsa9-1721238338661] [methodPath=POST:/scl/v1/equipment/router/88de7cd25f27/claim/pods] [routerMac=88de7cd25f27] [accountNumber=8337130011080779] [userName=p2197160] [status=200] [techMobile=true] [x-trace-id=1720705801769194081009825476769596142f411b7] [responseJson={"claimed":[{"bluetoothMac":"6c:ff:ce:cb:65:29","boxSerialNumber":"SA8A5001E0CHTRP","certificates":{"clientAfter":"1970-01-01T00:00:00.000Z","clientBefore":"2028-01-22T23:23:39.000Z","rootAfter":"2018-01-22T23:23:39.000Z","rootBefore":"2031-11-29T17:15:43.000Z"},"claimKeyRequired":false,"customerId":"6697fb785f7345000a00acea","deployment":"tau-prod","ethernet1Mac":"6c:ff:ce:cb:65:25","ethernetMac":"6c:ff:ce:cb:65:24","firmwareVersion":"2.4.3-72-g65b961c-prod","groupIds":["5d273b4a7f65315f752896b5"],"kvConfigs":[],"id":"SA8A5001E0CHTRP","locationId":"6697fb785f7345000a00aceb","model":"MAPV1S","packId":"f0d9323a-45db-48b8-aef3-acc8345b2913","partNumber":"253865327","partnerId":"5d273b4a7f65315f752896b5","purchaseOrderNumber":"PONA","radioMac24":"6c:ff:ce:cb:65:26","radioMac50U":"6c:ff:ce:cb:65:28","radioMac50L":"6c:ff:ce:cb:65:27","residentialGateway":false,"serialNumber":"SA8A5001E0CHTRP","shardNumber":"fakeShard","shipDate":"04/05/2022","subscriptionRequired":false,"subscriptionRequiredTerm":null,"unclaimable":false,"vendor":{"factory":"I","manufacturer":"SGCPTSN","mfgDate":"2022/13","name":"Plume","partNumber":"PP203X"},"createdAt":"2022-04-21T19:37:14.089Z","updatedAt":"2024-07-17T17:45:41.214Z","_id":"SA8A5001E0","_version":"1.1.0","backhaulDhcpPoolIdx":2}]}]

Second log is:
[2024-07-17 17:46:38.345UTC] [INFO ] [ConnectionAgent] [sessionId=51f9a46c-6e41-40f2-8406-e3bfbe97f420] - Node SA8A5001E0 connected: model=PP203X, firmware=4.2.0-91-g5cf406-prod, ovs=???, opensync=3.2.6.0, ip=100.101.206.57, vendorName=Plume


So what I am doing is to fetch list of boxSerialNumber from log 1 and then check if log 2 is generated for same boxSerialNumber.
I am using dedup sortby command as I need the first event timestamp.

claimTimenode_idrouterMac
07/17/2024 11:44:16SA8A5001E0 88de7cd25f27

 

connectionTimenode_id
07/17/2024 11:44:54 SA8A5001E0

 

Final Result should be merge of these 2 tables based on node_id.

claimTimenode_idrouterMacconnectionTime
07/17/2024 11:44:16SA8A5001E0 88de7cd25f2707/17/2024 11:44:54

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

(index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true") OR (index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X)
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=if(index="aws-cpe-scl",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=if(index="aws-cpe-osc",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| stats first(winClaimTime) as claimTime first(routerMac) as routerMac first(nodeFirstConnectedTime) as connectionTime by node_id

nkhanna
Engager

Thanks a lot!! It worked. Great help.

0 Karma

fredclown
Builder

Yes, if it can be done with stats that would be best. It is the Splunky way to do it.

nkhanna
Engager

Can you please show an example how to do it?

0 Karma

fredclown
Builder

Have you tried using a left (also called outer) join vs and inner join. An inner join will only give you data where the node_id appears in both sets of data. A left join will give you all the results from the base search joined where it matches in the sub search.

0 Karma

nkhanna
Engager

Yes, I have tried join with type Inner, outer, and left. Nothing gives me accurate results.

0 Karma
Get Updates on the Splunk Community!

Splunk Mobile: Your Brand-New Home Screen

Meet Your New Mobile Hub  Hello Splunk Community!  Staying connected to your data—no matter where you are—is ...

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Real progress on your strategic priorities starts with knowing the business outcomes your teams are delivering ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

As of today, Enterprise Security (ES) Essentials 8.3 is now generally available, helping SOC teams simplify ...