Getting Data In

mv extraction in props.conf

Dworsnop
Path Finder

I need to extract (at search time) a multivalue field in some JSON data in a manner that will allow me to perform additional, multiple regex's on the resulting field, all at search time. I can do this inline easily using:

| spath output=trigger_name path=triggeredComponents{}.triggeredFilters{}.filterType
| spath output=trigger_value path=triggeredComponents{}.triggeredFilters{}.trigger.value
| eval new_trig=mvzip(trigger_name,trigger_value,":")
| mvexpand new_trig

| rex field=new_trig "^Internal destination device name:(?<dest>.*)$"
| rex field=new_trig "^Destination IP:(?<dest_ip>(?:(?:\d{1,3}\.){3}(?:\d{1,3}))|(?:(?:::)?(?:[\dA-Fa-f]{1,4}:{1,2}){1,7}(?:[\d\%A-Fa-z\.]+)?(?:::)?)|(?:::[\dA-Fa-f\.]{1,15})|(?:::))$"

 

Thanks in advance  🙂

Edit: I already have KV_MODE=JSON in my props.conf

Labels (3)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

I usually recommend multiple rex commands over large, complex ones.  It's practically a requirement if the fields aren't guaranteed to be in the same order every time.  With a complex regex, it's too easy for a single character change to make the regex fail.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

I usually recommend multiple rex commands over large, complex ones.  It's practically a requirement if the fields aren't guaranteed to be in the same order every time.  With a complex regex, it's too easy for a single character change to make the regex fail.

---
If this reply helps you, Karma would be appreciated.

Dworsnop
Path Finder

I think that's what I'll be doing. Thanks very much for the help @richgalloway .

Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think you have some competing requirements/desires.  To get CIM compliance you just need to map the existing fields to the CIM equivalents.  Do that with field aliases.  Go to Settings->Fields->Field aliases to create them.  Or have your rex commands extract field using the CIM-compliant name.  To get the fields into an index, however, requires the extractions be done at index time or by a data model acceleration.

---
If this reply helps you, Karma would be appreciated.

Dworsnop
Path Finder

Hi @richgalloway , based on your reply I've had a stab at extracting the values from _raw via rex using the below (ignore the non-CIM-compliant names for now)...

| rex field=_raw "(\"filterType\":\"Connection hostname\"\,\"arguments\":\{\}\,\"comparatorType\":\"display\"\,\"trigger\":\{\"value\":\"(?<conn_host>[a-z0-9_\.-]+)\"\}\}\,\{\"cfid\"|\"filterType\":\"Message\"\,\"arguments\":\{\}\,\"comparatorType\":\"display\"\,\"trigger\":\{\"value\":\"(?<msg>[a-z0-9_\.-]+)\"\}\}\,\{\"cfid\"|\"filterType\":\"Internal source device name\"\,\"arguments\":\{\}\,\"comparatorType\":\"display\"\,\"trigger\":\{\"value\":\"(?<src_new>[a-z0-9_\.-]+)\"\}\}\,\{\"cfid\"|\"filterType\":\"Internal destination device name\"\,\"arguments\":\{\}\,\"comparatorType\":\"display\"\,\"trigger\":\{\"value\":\"(?<dest_new>[a-z0-9_\.-]+)\"\}\}\,\{\"cfid\"|\"filterType\":\"Destination IP\"\,\"arguments\":\{\}\,\"comparatorType\":\"display\"\,\"trigger\":\{\"value\":\"(?<dest_ip>(?:(?:\d{1,3}\.){3}(?:\d{1,3}))|(?:(?:::)?(?:[\dA-Fa-f]{1,4}:{1,2}){1,7}(?:[\d\%A-Fa-z\.]+)?(?:::)?)|(?:::[\dA-Fa-f\.]{1,15})|(?:::))|\"filterType\":\"Destination port\"\,\"arguments\":\{\}\,\"comparatorType\":\"display\"\,\"trigger\":\{\"value\":\"(?<dest_port>\d+))"

Unfortunately this is resulting in only one of the above fields being extracted for each event. I had hoped to avoid doing multiple individual rex's against _raw by putting each expression with a capture group (...|...|...). Am I missing some clever syntax here or is it just not possible?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share some sample events.  Also, tell us what's wrong with the existing search.

---
If this reply helps you, Karma would be appreciated.
0 Karma

Dworsnop
Path Finder

Thanks for replying @richgalloway , the events come from an anomaly detection system and look like this (raw) - I've had to redact it quite heavily, hopefully it will still be of some use: 

{"creationTime":...,"breachUrl":"https://.../.../...","commentCount":0,"pbid":1315000,"time":...,"model":{"name":"Compromise::Watched Domain","pid":11,"phid":8116,"uuid":"...","logic":{"data":[{"cid":14400,"weight":1},{"cid":14401,"weight":1},{"cid":14402,"weight":1},{"cid":14403,"weight":1}],"targetScore":1,"type":"weightedComponentList","version":1},"throttle":3600,"sharedEndpoints":false,"actions":{"alert":true,"...":{},"breach":true,"model":true,"setPriority":false,"setTag":false,"setType":false},"tags":["..."],"interval":3600,"sequenced":false,"active":true,"modified":"...","activeTimes":{"devices":{"93657":[{}],"2830":[{}],"636":[{}],"2957":[{}],"52344":[{}],"4329":[{}],"913":[{}],"44":[{}]},"tags":{},"type":"exclusions","version":2},"priority":5,"autoUpdatable":true,"autoUpdate":true,"autoSuppress":true,"description":"...","behaviour":"decreasing","defeats":[],"created":{"by":"..."},"edited":{"by":"...","userID"...},"version":24},"triggeredComponents":[{"time":...,"cbid":...,"cid":14400,"chid":25028,"size":1,"threshold":0,"interval":3600,"logic":{...}}}}},"version":"v..."},"metric":{"mlid":220,"name":"...","label":"Watched Domain"},"triggeredFilters":[{"cfid":111671,"id":"A","filterType":"Watched endpoint source","arguments":{"value":".+"},"comparatorType":"does not match regular expression","trigger":{"value":""}},{"cfid":111673,"id":"C","filterType":"Direction","arguments":{"value":"out"},"comparatorType":"is","trigger":{"value":"out"}},{"cfid":111675,"id":"E","filterType":"Internal source device type","arguments":{"value":"12"},"comparatorType":"is not","trigger":{"value":"Server"}},{"cfid":111676,"id":"d1","filterType":"Internal source device type","arguments":{},"comparatorType":"display","trigger":{"value":"Server"}},{"cfid":111677,"id":"d2","filterType":"Connection hostname","arguments":{},"comparatorType":"display","trigger":{"value":""}},{"cfid":111678,"id":"d3","filterType":"Destination IP","arguments":{},"comparatorType":"display","trigger":{"value":"1.2.3.4"}},{"cfid":111679,"id":"d4","filterType":"ASN","arguments":{},"comparatorType":"display","trigger":{"value":""}},{"cfid":111680,"id":"d5","filterType":"Country","arguments":{},"comparatorType":"display","trigger":{"value":""}},{"cfid":111681,"id":"d6","filterType":"Message","arguments":{},"comparatorType":"display","trigger":{"value":"politicweekend.com"}},{"cfid":111682,"id":"d7","filterType":"Watched endpoint","arguments":{},"comparatorType":"display","trigger":{"value":"true"}},{"cfid":111683,"id":"d8","filterType":"Watched endpoint source","arguments":{},"comparatorType":"display","trigger":{"value":""}}]}],"score":0.161,"device":{"did":74376,"ip":"5.6.7.8","ips":[{"ip":"5.6.7.8","timems":...,"time":"...","sid":512731}],"sid":512731,"firstSeen":...,"lastSeen":...,"devicelabel":"...","typename":"server","typelabel":"Server"}}

 

There's nothing wrong with my inline search. I'm trying to make the events CIM-compliant and fields like dest are contained in this new_trig mv field that I've spath'd and then regex'd out. So I want the CIM fields to be present in the index so things like the Intrusion Detection data model can be populated for use by Enterprise Security. Also, the regex's will differ according to the detection model that has been breached, i.e. the actual destination for an event might be called something different in 'trigger_name' according to the model being breached.

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...