Splunk Search

How to achieve a field extraction where field differs in location per log row but still has structure?

LeeMoe
Path Finder

I have an OpenCanary which is using a webhook to deliver data into my Splunk instance.

It works really well but my regex is a bit rubbish and the field extraction is not going well.  The wizard is getting me a reasonable way but the OpenCanary moves the log items around in the rows and this foxes the wizard which seems to see the repetition and resists my attempts to defeat it when I try to take the text after some labels (namely Port which works as it's in the same location per line, Username, Password and src_host.

Two lines which should help with the understanding of my challenge.

message="{\"dst_host\": \"10.0.0.117\", \"dst_port\": 23, \"local_time\": \"2023-02-08 16:20:12.113362\", \"local_time_adjusted\": \"2023-02-08 17:20:12.113390\", \"logdata\": {\"PASSWORD\": \"admin\", \"USERNAME\": \"Administrator\"}, \"logtype\": 6001, \"node_id\": \"hostname.domain\", \"src_host\": \"114.216.162.49\", \"src_port\": 47106, \"utc_time\": \"2023-02-08 16:20:12.113383\"}" path=/opencanary/APIKEY_SECRET full_path=/opencanary/APIKEY_SECRET query="" command=POST client_address=100.86.224.114 client_port=54770

message="{\"dst_host\": \"10.0.0.117\", \"dst_port\": 22, \"local_time\": \"2023-02-08 16:20:11.922514\", \"local_time_adjusted\": \"2023-02-08 17:20:11.922544\", \"logdata\": {\"LOCALVERSION\": \"SSH-2.0-OpenSSH_5.1p1 Debian-4\", \"PASSWORD\": \"abc123!\", \"REMOTEVERSION\": \"SSH-2.0-PUTTY\", \"USERNAME\": \"root\"}, \"logtype\": 4002, \"node_id\": \"hostname.domain\", \"src_host\": \"61.177.172.124\", \"src_port\": 17802, \"utc_time\": \"2023-02-08 16:20:11.922536\"}" path=/opencanary/APIKEY_SECRET full_path=/opencanary/APIKEY_SECRET query="" command=POST client_address=100.86.224.114 client_port=54768

Any regex experts will help me build out pivots and reporting for my OpenCanary which gets around 200'000 connection attempts every 7 days 🙂

Labels (2)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

As usual, do not use rex to treat structured data such as JSON as text.  "Order" of nodes in conformant JSON is not defined, so it should be expected that any node can appear anywhere.  In your case, Splunk cannot extract the field "message" effectively because the kv delimiter is space, and the escaped JSON text itself contains space.  No sweat.  Just use rex to extract, then use spath to flatten JSON.

 

| rex "message=\"(?<message>{.+})\" +path="
| eval message = replace(message, ".\"", "\"")
| spath input=message

 

Your sample data then should give you

dst_hostdst_portlocal_timelocal_time_adjustedlogdata.LOCALVERSIONlogdata.PASSWORDlogdata.REMOTEVERSIONlogdata.USERNAMElogtypenode_idsrc_hostsrc_portutc_time
10.0.0.117232023-02-08 16:20:12.1133622023-02-08 17:20:12.113390 admin Administrator6001hostname.domain114.216.162.49471062023-02-08 16:20:12.113383
10.0.0.117222023-02-08 16:20:11.9225142023-02-08 17:20:11.922544SSH-2.0-OpenSSH_5.1p1 Debian-4abc123!SSH-2.0-PUTTYroot4002hostname.domain61.177.172.124178022023-02-08 16:20:11.922536

Hope this helps

View solution in original post

Tags (1)

LeeMoe
Path Finder

My friend, thanks to you I have some very nice dashboards.  Long live my project 🙂

SplunkDash.png

 

0 Karma

LeeMoe
Path Finder

@yuanliu I salute you.  I understand your solution which is eternally graceful and works.  Yes, I have 40% of records that have no username or password but that's about the normal volume.  I expect you have more than the 2 days of experience with Splunk I have, this is a hobby implementation 😊

🍺or 🥞 is on me, many thanks again.

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

As usual, do not use rex to treat structured data such as JSON as text.  "Order" of nodes in conformant JSON is not defined, so it should be expected that any node can appear anywhere.  In your case, Splunk cannot extract the field "message" effectively because the kv delimiter is space, and the escaped JSON text itself contains space.  No sweat.  Just use rex to extract, then use spath to flatten JSON.

 

| rex "message=\"(?<message>{.+})\" +path="
| eval message = replace(message, ".\"", "\"")
| spath input=message

 

Your sample data then should give you

dst_hostdst_portlocal_timelocal_time_adjustedlogdata.LOCALVERSIONlogdata.PASSWORDlogdata.REMOTEVERSIONlogdata.USERNAMElogtypenode_idsrc_hostsrc_portutc_time
10.0.0.117232023-02-08 16:20:12.1133622023-02-08 17:20:12.113390 admin Administrator6001hostname.domain114.216.162.49471062023-02-08 16:20:12.113383
10.0.0.117222023-02-08 16:20:11.9225142023-02-08 17:20:11.922544SSH-2.0-OpenSSH_5.1p1 Debian-4abc123!SSH-2.0-PUTTYroot4002hostname.domain61.177.172.124178022023-02-08 16:20:11.922536

Hope this helps

Tags (1)
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...