I have an OpenCanary which is using a webhook to deliver data into my Splunk instance.
It works really well but my regex is a bit rubbish and the field extraction is not going well. The wizard is getting me a reasonable way but the OpenCanary moves the log items around in the rows and this foxes the wizard which seems to see the repetition and resists my attempts to defeat it when I try to take the text after some labels (namely Port which works as it's in the same location per line, Username, Password and src_host.
Two lines which should help with the understanding of my challenge.
message="{\"dst_host\": \"10.0.0.117\", \"dst_port\": 23, \"local_time\": \"2023-02-08 16:20:12.113362\", \"local_time_adjusted\": \"2023-02-08 17:20:12.113390\", \"logdata\": {\"PASSWORD\": \"admin\", \"USERNAME\": \"Administrator\"}, \"logtype\": 6001, \"node_id\": \"hostname.domain\", \"src_host\": \"114.216.162.49\", \"src_port\": 47106, \"utc_time\": \"2023-02-08 16:20:12.113383\"}" path=/opencanary/APIKEY_SECRET full_path=/opencanary/APIKEY_SECRET query="" command=POST client_address=100.86.224.114 client_port=54770
message="{\"dst_host\": \"10.0.0.117\", \"dst_port\": 22, \"local_time\": \"2023-02-08 16:20:11.922514\", \"local_time_adjusted\": \"2023-02-08 17:20:11.922544\", \"logdata\": {\"LOCALVERSION\": \"SSH-2.0-OpenSSH_5.1p1 Debian-4\", \"PASSWORD\": \"abc123!\", \"REMOTEVERSION\": \"SSH-2.0-PUTTY\", \"USERNAME\": \"root\"}, \"logtype\": 4002, \"node_id\": \"hostname.domain\", \"src_host\": \"61.177.172.124\", \"src_port\": 17802, \"utc_time\": \"2023-02-08 16:20:11.922536\"}" path=/opencanary/APIKEY_SECRET full_path=/opencanary/APIKEY_SECRET query="" command=POST client_address=100.86.224.114 client_port=54768
Any regex experts will help me build out pivots and reporting for my OpenCanary which gets around 200'000 connection attempts every 7 days 🙂
As usual, do not use rex to treat structured data such as JSON as text. "Order" of nodes in conformant JSON is not defined, so it should be expected that any node can appear anywhere. In your case, Splunk cannot extract the field "message" effectively because the kv delimiter is space, and the escaped JSON text itself contains space. No sweat. Just use rex to extract, then use spath to flatten JSON.
| rex "message=\"(?<message>{.+})\" +path="
| eval message = replace(message, ".\"", "\"")
| spath input=message
Your sample data then should give you
dst_host | dst_port | local_time | local_time_adjusted | logdata.LOCALVERSION | logdata.PASSWORD | logdata.REMOTEVERSION | logdata.USERNAME | logtype | node_id | src_host | src_port | utc_time |
10.0.0.117 | 23 | 2023-02-08 16:20:12.113362 | 2023-02-08 17:20:12.113390 | admin | Administrator | 6001 | hostname.domain | 114.216.162.49 | 47106 | 2023-02-08 16:20:12.113383 | ||
10.0.0.117 | 22 | 2023-02-08 16:20:11.922514 | 2023-02-08 17:20:11.922544 | SSH-2.0-OpenSSH_5.1p1 Debian-4 | abc123! | SSH-2.0-PUTTY | root | 4002 | hostname.domain | 61.177.172.124 | 17802 | 2023-02-08 16:20:11.922536 |
Hope this helps
My friend, thanks to you I have some very nice dashboards. Long live my project 🙂
@yuanliu I salute you. I understand your solution which is eternally graceful and works. Yes, I have 40% of records that have no username or password but that's about the normal volume. I expect you have more than the 2 days of experience with Splunk I have, this is a hobby implementation 😊
🍺or 🥞 is on me, many thanks again.
As usual, do not use rex to treat structured data such as JSON as text. "Order" of nodes in conformant JSON is not defined, so it should be expected that any node can appear anywhere. In your case, Splunk cannot extract the field "message" effectively because the kv delimiter is space, and the escaped JSON text itself contains space. No sweat. Just use rex to extract, then use spath to flatten JSON.
| rex "message=\"(?<message>{.+})\" +path="
| eval message = replace(message, ".\"", "\"")
| spath input=message
Your sample data then should give you
dst_host | dst_port | local_time | local_time_adjusted | logdata.LOCALVERSION | logdata.PASSWORD | logdata.REMOTEVERSION | logdata.USERNAME | logtype | node_id | src_host | src_port | utc_time |
10.0.0.117 | 23 | 2023-02-08 16:20:12.113362 | 2023-02-08 17:20:12.113390 | admin | Administrator | 6001 | hostname.domain | 114.216.162.49 | 47106 | 2023-02-08 16:20:12.113383 | ||
10.0.0.117 | 22 | 2023-02-08 16:20:11.922514 | 2023-02-08 17:20:11.922544 | SSH-2.0-OpenSSH_5.1p1 Debian-4 | abc123! | SSH-2.0-PUTTY | root | 4002 | hostname.domain | 61.177.172.124 | 17802 | 2023-02-08 16:20:11.922536 |
Hope this helps