I am not getting full data in output when combining 2 queries using join. When I run first query individually, I get 143 results after dedup but upon joining, I am getting 71 results only. Whereas I know that for remaining records, data is available when running 2nd query individually.
How can I fix this?
I am searching for records where pods got claimed and then searching for connected time using subsearch and need output of all columns in tabular format.
index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true"
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=strftime(_time,"%m/%d/%Y %H:%M:%S")
| table winClaimTime, accountNumber, routerMac, node_id, locationId, status, techMobile
| dedup routerMac, node_id sortby winClaimTime
| join type=inner node_id
[ search index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=strftime(_time,"%m/%d/%Y %H:%M:%S")
| table nodeFirstConnectedTime, node_id
| dedup node_id sortby nodeFirstConnectedTime]
| table winClaimTime, accountNumber, routerMac, node_id, locationId, status, techMobile, nodeFirstConnectedTime
Try something like this
(index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true") OR (index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X)
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=if(index="aws-cpe-scl",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=if(index="aws-cpe-osc",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| stats first(winClaimTime) as claimTime first(routerMac) as routerMac first(nodeFirstConnectedTime) as connectionTime by node_id
Ideally, don't use join!
You could try searching both indexes in your initial search and then "join" the data from the events using stats.
Can you please show an example how to do it?
Please provide (anonymised) examples of your events in code blocks </> to preserve formatting information and a description of what you are trying to get out of them, for example, why are you using dedup? Also, what do your expected results look like (for the example events you have provided)?
First log is:
[2024-07-17T17:45:41.644Z] [INFO] [txnId=cz81j0kykru05f66jgsa9-1721238338661] [methodPath=POST:/scl/v1/equipment/router/88de7cd25f27/claim/pods] [routerMac=88de7cd25f27] [accountNumber=8337130011080779] [userName=p2197160] [status=200] [techMobile=true] [x-trace-id=1720705801769194081009825476769596142f411b7] [responseJson={"claimed":[{"bluetoothMac":"6c:ff:ce:cb:65:29","boxSerialNumber":"SA8A5001E0CHTRP","certificates":{"clientAfter":"1970-01-01T00:00:00.000Z","clientBefore":"2028-01-22T23:23:39.000Z","rootAfter":"2018-01-22T23:23:39.000Z","rootBefore":"2031-11-29T17:15:43.000Z"},"claimKeyRequired":false,"customerId":"6697fb785f7345000a00acea","deployment":"tau-prod","ethernet1Mac":"6c:ff:ce:cb:65:25","ethernetMac":"6c:ff:ce:cb:65:24","firmwareVersion":"2.4.3-72-g65b961c-prod","groupIds":["5d273b4a7f65315f752896b5"],"kvConfigs":[],"id":"SA8A5001E0CHTRP","locationId":"6697fb785f7345000a00aceb","model":"MAPV1S","packId":"f0d9323a-45db-48b8-aef3-acc8345b2913","partNumber":"253865327","partnerId":"5d273b4a7f65315f752896b5","purchaseOrderNumber":"PONA","radioMac24":"6c:ff:ce:cb:65:26","radioMac50U":"6c:ff:ce:cb:65:28","radioMac50L":"6c:ff:ce:cb:65:27","residentialGateway":false,"serialNumber":"SA8A5001E0CHTRP","shardNumber":"fakeShard","shipDate":"04/05/2022","subscriptionRequired":false,"subscriptionRequiredTerm":null,"unclaimable":false,"vendor":{"factory":"I","manufacturer":"SGCPTSN","mfgDate":"2022/13","name":"Plume","partNumber":"PP203X"},"createdAt":"2022-04-21T19:37:14.089Z","updatedAt":"2024-07-17T17:45:41.214Z","_id":"SA8A5001E0","_version":"1.1.0","backhaulDhcpPoolIdx":2}]}]
Second log is:
[2024-07-17 17:46:38.345UTC] [INFO ] [ConnectionAgent] [sessionId=51f9a46c-6e41-40f2-8406-e3bfbe97f420] - Node SA8A5001E0 connected: model=PP203X, firmware=4.2.0-91-g5cf406-prod, ovs=???, opensync=3.2.6.0, ip=100.101.206.57, vendorName=Plume
So what I am doing is to fetch list of boxSerialNumber from log 1 and then check if log 2 is generated for same boxSerialNumber.
I am using dedup sortby command as I need the first event timestamp.
claimTime | node_id | routerMac |
07/17/2024 11:44:16 | SA8A5001E0 | 88de7cd25f27 |
connectionTime | node_id |
07/17/2024 11:44:54 | SA8A5001E0 |
Final Result should be merge of these 2 tables based on node_id.
claimTime | node_id | routerMac | connectionTime |
07/17/2024 11:44:16 | SA8A5001E0 | 88de7cd25f27 | 07/17/2024 11:44:54 |
Try something like this
(index=aws-cpe-scl source=*winserver* "methodPath=POST:/scl/v1/equipment/router/*/claim/pods" responseJson "techMobile=true") OR (index=aws-cpe-osc ConnectionAgent "Node * connected:" model=PP203X)
| rex "responseJson=(?<json>.*)"
| eval routerMac = routerMac
| eval techMobile = techMobile
| eval status = status
| spath input=json path=claimed{}.boxSerialNumber output=podSerialNumber
| spath input=json path=claimed{}.locationId output=locationId
| eval node_id = substr(podSerialNumber, 0, 10)
| eval winClaimTime=if(index="aws-cpe-scl",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| rex field=_raw "Node\s(?<node_id>\w+)\sconnected"
| eval nodeFirstConnectedTime=if(index="aws-cpe-osc",strftime(_time,"%m/%d/%Y %H:%M:%S"),null())
| stats first(winClaimTime) as claimTime first(routerMac) as routerMac first(nodeFirstConnectedTime) as connectionTime by node_id
Thanks a lot!! It worked. Great help.
Yes, if it can be done with stats that would be best. It is the Splunky way to do it.
Can you please show an example how to do it?
Have you tried using a left (also called outer) join vs and inner join. An inner join will only give you data where the node_id appears in both sets of data. A left join will give you all the results from the base search joined where it matches in the sub search.
Yes, I have tried join with type Inner, outer, and left. Nothing gives me accurate results.