Getting Data In

Regex working on search but not props/transforms

oliverja
Path Finder

I am trying to extract a single section from within some JSON. (The original event is wrapped in even more json). I have built a regex and tested it, and everything seems to work.

index=* sourcetype=suricata | rex field=_raw "\"original\":(?<originalMsg>.+?})},"

BUT once I put it into the config files, nothing happens.

Props:

[source::http:kafka_iap-suricata-log]
LINE_BREAKER = (`~!\^<)
SHOULD_LINEMERGE = false
TRANSFORMS-also = extractMessage

Transforms:

[extractMessage]
REGEX = "original":(.+?})},
DEST_KEY= _raw
FORMAT = $1

Inputs:

[http://kafka_iap-suricata-log]
disabled = 0
index = ids-suricata-ext
token = tokenyNumbersGoHere
sourcetype = suricata

Sample Event (copied from _raw):

{"destination":{"ip":"192.168.0.1","port":80,"address":"192.168.0.1"},"ecs":{"version":"1.12.0"},"host":{"name":"rsm"},"fileset":{"name":"eve"},"input":{"type":"log"},"suricata":{"eve":{"http":{"http_method":"\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0GET","hostname":"7.tlup.microsoft.com","url":"/filestreamingservice/files/eb3d","length":0,"protocol":"HTTP/1.1","http_user_agent":"Microsoft-Delivery-Optimization/10.0"},"event_type":"http","flow_id":"841906347931855","tx_id":4,"in_iface":"ens3f0"}},"service":{"type":"suricata"},"source":{"ip":"192.168.0.1","port":57576,"address":"192.168.0.1"},"network.direction":"external","log":{"offset":1363677358,"file":{"path":"/data/suricata/eve.json"}},"@timestamp":"2022-05-05T09:29:05.976Z","agent":{"hostname":"xxx","ephemeral_id":"5a1cb090","id":"bd4004192","name":"ram-nsm","type":"filebeat","version":"7.16.2"},"tags":["iap","suricata"],"@version":"1","event":{"created":"2022-05-05T09:29:06.819Z","module":"suricata","dataset":"suricata.eve","original":{"http":{"http_method":"\\0\\0\\0\\0\\0\\0\\0\\00\\0\\0GET","hostname":"7.t.microsoft.com","url":"/filestreamingservice/files/eb3d","length":0,"protocol":"HTTP/1.1","http_user_agent":"Microsoft-Delivery-Optimization/10.0"},"dest_port":80,"flow_id":845,"in_iface":"ens3f0","proto":"TCP","src_port":57576,"dest_ip":"192.168.0.1","event_type":"http","timestamp":"2022-05-05T09:29:05.976989+0000","tx_id":4,"src_ip":"192.168.0.1"}},"network":{"transport":"TCP","community_id":"1:uE="}}

 

0 Karma
1 Solution

oliverja
Path Finder

After all of this stupid fighting with my regexes, it turns out that some events were working, and some were not. This was getting lost in the noise of the other events.

Long stupid story short, the examples I gave were working fine, because I was trimming them before posting, or the one I shared was already working.

Instead, several of my logs were running into the 4096 default char limit LOOKAHEAD= in transforms.conf.

I bumped this up to 20k and everything regexes just fine.

Sorry for the wild goose chase.

 

https://community.splunk.com/t5/Getting-Data-In/Index-Time-Extractions-Regex-meeting-character-limit...

View solution in original post

oliverja
Path Finder

After all of this stupid fighting with my regexes, it turns out that some events were working, and some were not. This was getting lost in the noise of the other events.

Long stupid story short, the examples I gave were working fine, because I was trimming them before posting, or the one I shared was already working.

Instead, several of my logs were running into the 4096 default char limit LOOKAHEAD= in transforms.conf.

I bumped this up to 20k and everything regexes just fine.

Sorry for the wild goose chase.

 

https://community.splunk.com/t5/Getting-Data-In/Index-Time-Extractions-Regex-meeting-character-limit...

isoutamo
SplunkTrust
SplunkTrust
You are not the only one who has hit by this 😉 There is also TRUNCATE which must be enough long to do some other stuff on conf file.
0 Karma

oliverja
Path Finder

I tested with transform: 

REGEX = ("original")

and every event became one word, "original". 

So I know my data is being manipulated by the transform.

This leaves me at the "My REX is bad!", which doesnt make sense because it works fine in splunk searches and in regex101.com against the _raw . I don't know how to debug something that has no apparent bug.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Are you using HEC to get this in or UF?

Can you post your original event, not that which are already in splunk (_raw)?

r. Ismo

0 Karma

oliverja
Path Finder

Source: HEC, RAW

iap-suricata-dev:
{ 
  "name": "iap-suricata-dev", 
  "splunk.hec.ssl.validate.certs": "false",
  "splunk.hec.raw.line.breaker": "`~!^<", 
  "splunk.hec.uri": "https://x.x.x.x:8088",
  "topics": "iap-suricata-log",
  "splunk.hec.raw": "true", 
  "splunk.hec.token": "xxxx",
  "tasks.max": "7",  
  "connector.class": "com.splunk.kafka.connect.SplunkSinkConnector",
  "splunk.indexes": "menlo2", 
  "splunk.hec.ack.enabled": "false"
}

 

Per my admins on the "other side", this is what is being sent from Kafka to my HEC:

{"destination":{"ip":"192.168.0.1","port":1235,"address":"192.168.0.1"},"ecs":{"version":"1.12.0"},"host":{"name":"ptm-nsm"},"fileset":{"name":"eve"},"input":{"type":"log"},"suricata":{"eve":{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"event_type":"http","flow_id":"1550457178752986","tx_id":0}},"service":{"type":"suricata"},"log":{"offset":1125537802,"file":{"path":"/opt/suricata/eve.json"}},"network.direction":"external","source":{"ip":"192.168.0.1","port":38394,"address":"192.168.0.1"},"@timestamp":"2022-05-06T09:59:09.246Z","agent":{"hostname":"ptm-nsm","ephemeral_id":"dd64db01","id":"422ff9","name":"ptm-nsm","type":"filebeat","version":"7.16.2"},"tags":["iap","suricata"],"@version":"1","event":{"created":"2022-05-06T09:59:09.632Z","module":"suricata","dataset":"suricata.eve","original":{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"dest_port":1235,"proto":"TCP","src_port":38394,"dest_ip":"192.168.0.1","event_type":"http","timestamp":"2022-05-06T09:59:09.246372+0000","flow_id":1550457178752986,"src_ip":"192.168.0.1","tx_id":0}},"network":{"transport":"TCP","community_id":"1:Mbl3VcTAk="}}

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

When I testing this with the next configurations:

transforms.conf

[extractMessage]
REGEX = "original":(.+?})},
DEST_KEY= _raw
FORMAT = $1
WRITE_META = true

props.conf

[source::http:iap-suricata-dev]
LINE_BREAKER = (`~!\^<)
SHOULD_LINEMERGE = false
TRANSFORMS-also = extractMessage

 inputs.conf

[http://iap-suricata-dev]
disabled = 0
host = xxxxx
index = splunk_test
indexes = splunk_test
token = eae66351-a931-4be1-83fa-2787781f501f

with cURL

curl -vkH "Authorization: Splunk eae66351-a931-4be1-83fa-2787781f501f" https://localhost:8088/services/collector/raw?channel=1-2-3-4 -d '{"host":"myhost", "sourcetype":"my_st", "event":{"destination":{"ip":"192.168.0.1","port":1235,"address":"192.168.0.1"},"ecs":{"version":"1.12.0"},"host":{"name":"ptm-nsm"},"fileset":{"name":"eve"},"input":{"type":"log"},"suricata":{"eve":{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"event_type":"http","flow_id":"1550457178752986","tx_id":0}},"service":{"type":"suricata"},"log":{"offset":1125537802,"file":{"path":"/opt/suricata/eve.json"}},"network.direction":"external","source":{"ip":"192.168.0.1","port":38394,"address":"192.168.0.1"},"@timestamp":"2022-05-06T09:59:09.246Z","agent":{"hostname":"ptm-nsm","ephemeral_id":"dd64db01","id":"422ff9","name":"ptm-nsm","type":"filebeat","version":"7.16.2"},"tags":["iap","suricata"],"@version":"1","event":{"created":"2022-05-06T09:59:09.632Z","module":"suricata","dataset":"suricata.eve","original":{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"dest_port":1235,"proto":"TCP","src_port":38394,"dest_ip":"192.168.0.1","event_type":"http","timestamp":"2022-05-06T09:59:09.246372+0000","flow_id":1550457178752986,"src_ip":"192.168.0.1","tx_id":0}},"network":{"transport":"TCP","community_id":"1:Mbl3VcTAk="}}}'

 It works correctly and event contains only

{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"dest_port":1235,"proto":"TCP","src_port":38394,"dest_ip":"192.168.0.1","event_type":"http","timestamp":"2022-05-06T09:59:09.246372+0000","flow_id":1550457178752986,"src_ip":"192.168.0.1","tx_id":0}

 All this has done on single node instance.

Could it be that you have distributed environment and you haven't deploy those configurations on all HEC nodes? 

0 Karma

oliverja
Path Finder

It is a single node deployment for me as well, and your curl example parses perfectly.

So, something is inside the original message that does not make it into the _raw and I cannot see it on the source side and is breaking the regex.

 

More research:

I moved my regex up to the beginning of the message, trying to filter out anything that shows up mid-message that might break it. 

"ecs":(.+?})}

....and it still doesnt match. BUT i did notice something. It looks like a lot of messages are indeed matching, but not all of them. When I copy the failed messages and regex101 or put them in via curl, they work fine. So, the plot thickens. 

1) Regex101 and splunk search time extractions process 100% of my logs

2) Splunk transforms processes 80 - 90% of my logs

 

0 Karma

oliverja
Path Finder

It also looks like 

|  spath output=actualEvent path=event.original

does exactly what I need, but at search time. All I need it to discard all data but event.original, and then index that.

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@oliverja - Do you want to just keep that part (original) as your _raw event and remove everything at index time?

1. I'm making this assumption because you used TRANSFORMS in props.conf and you used DEST_KEY=_raw in the transforms.conf stanza.

# Add WRITE_META parameter in your transforms.conf stanza

[extractMessage]
REGEX = "original":(.+?})},
DEST_KEY= _raw
FORMAT = $1
WRITE_META = true

 

2. Or your goal is to extract a new index-time field?

3. Or do you want to just extract a new field, not necessarily index-time or search-time?

 

I hope this helps!!!

0 Karma

oliverja
Path Finder

Updated transforms with WRITE_META, still no change.

[extractMessage]
REGEX = "original":(.+?})},
DEST_KEY= _raw
FORMAT = $1
WRITE_META = true

 

And in answer to your question, I want to discard all data except the "original" section, and make that my whole message.

"original" is the actual original message, the rest of the info is just a json wrapper from another tool. 

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@oliverja - Everything else looks okay.

Just to make sure you need to deploy this configuration on the first full Splunk instance. Heavy Forwarder or Indexers. UF will not process TRANSFORMS. 

If you are confused about where to deploy the configuration, you can put the configuration everywhere.

0 Karma

oliverja
Path Finder

Single instance of Splunk, so all configs are "everywhere".

I tested with a basic regex (outlined above) and it worked, so I have to assume it is my search, not my config. 

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@oliverja - Just to be sure is this the source you are receiving data in?

[source::http:kafka_iap-suricata-log]
0 Karma

oliverja
Path Finder

For sure. Inputs:

[http://kafka_iap-suricata-log]
disabled = 0
index = ids-suricata-ext
token = tokenyNumbersGoHere
sourcetype = suricata

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Try adding WRITE_META = true to the props.conf transforms.conf stanza.

---
If this reply helps you, Karma would be appreciated.
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@richgalloway - you mean transforms.conf stanza.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Yes, I do.  Thanks.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...