I am trying to extract a single section from within some JSON. (The original event is wrapped in even more json). I have built a regex and tested it, and everything seems to work.
index=* sourcetype=suricata | rex field=_raw "\"original\":(?<originalMsg>.+?})},"
BUT once I put it into the config files, nothing happens.
Props:
[source::http:kafka_iap-suricata-log]
LINE_BREAKER = (`~!\^<)
SHOULD_LINEMERGE = false
TRANSFORMS-also = extractMessage
Transforms:
[extractMessage]
REGEX = "original":(.+?})},
DEST_KEY= _raw
FORMAT = $1
Inputs:
[http://kafka_iap-suricata-log]
disabled = 0
index = ids-suricata-ext
token = tokenyNumbersGoHere
sourcetype = suricata
Sample Event (copied from _raw):
{"destination":{"ip":"192.168.0.1","port":80,"address":"192.168.0.1"},"ecs":{"version":"1.12.0"},"host":{"name":"rsm"},"fileset":{"name":"eve"},"input":{"type":"log"},"suricata":{"eve":{"http":{"http_method":"\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0GET","hostname":"7.tlup.microsoft.com","url":"/filestreamingservice/files/eb3d","length":0,"protocol":"HTTP/1.1","http_user_agent":"Microsoft-Delivery-Optimization/10.0"},"event_type":"http","flow_id":"841906347931855","tx_id":4,"in_iface":"ens3f0"}},"service":{"type":"suricata"},"source":{"ip":"192.168.0.1","port":57576,"address":"192.168.0.1"},"network.direction":"external","log":{"offset":1363677358,"file":{"path":"/data/suricata/eve.json"}},"@timestamp":"2022-05-05T09:29:05.976Z","agent":{"hostname":"xxx","ephemeral_id":"5a1cb090","id":"bd4004192","name":"ram-nsm","type":"filebeat","version":"7.16.2"},"tags":["iap","suricata"],"@version":"1","event":{"created":"2022-05-05T09:29:06.819Z","module":"suricata","dataset":"suricata.eve","original":{"http":{"http_method":"\\0\\0\\0\\0\\0\\0\\0\\00\\0\\0GET","hostname":"7.t.microsoft.com","url":"/filestreamingservice/files/eb3d","length":0,"protocol":"HTTP/1.1","http_user_agent":"Microsoft-Delivery-Optimization/10.0"},"dest_port":80,"flow_id":845,"in_iface":"ens3f0","proto":"TCP","src_port":57576,"dest_ip":"192.168.0.1","event_type":"http","timestamp":"2022-05-05T09:29:05.976989+0000","tx_id":4,"src_ip":"192.168.0.1"}},"network":{"transport":"TCP","community_id":"1:uE="}}
After all of this stupid fighting with my regexes, it turns out that some events were working, and some were not. This was getting lost in the noise of the other events.
Long stupid story short, the examples I gave were working fine, because I was trimming them before posting, or the one I shared was already working.
Instead, several of my logs were running into the 4096 default char limit LOOKAHEAD= in transforms.conf.
I bumped this up to 20k and everything regexes just fine.
Sorry for the wild goose chase.
After all of this stupid fighting with my regexes, it turns out that some events were working, and some were not. This was getting lost in the noise of the other events.
Long stupid story short, the examples I gave were working fine, because I was trimming them before posting, or the one I shared was already working.
Instead, several of my logs were running into the 4096 default char limit LOOKAHEAD= in transforms.conf.
I bumped this up to 20k and everything regexes just fine.
Sorry for the wild goose chase.
I tested with transform:
REGEX = ("original")
and every event became one word, "original".
So I know my data is being manipulated by the transform.
This leaves me at the "My REX is bad!", which doesnt make sense because it works fine in splunk searches and in regex101.com against the _raw . I don't know how to debug something that has no apparent bug.
Hi
Are you using HEC to get this in or UF?
Can you post your original event, not that which are already in splunk (_raw)?
r. Ismo
Source: HEC, RAW
iap-suricata-dev:
{
"name": "iap-suricata-dev",
"splunk.hec.ssl.validate.certs": "false",
"splunk.hec.raw.line.breaker": "`~!^<",
"splunk.hec.uri": "https://x.x.x.x:8088",
"topics": "iap-suricata-log",
"splunk.hec.raw": "true",
"splunk.hec.token": "xxxx",
"tasks.max": "7",
"connector.class": "com.splunk.kafka.connect.SplunkSinkConnector",
"splunk.indexes": "menlo2",
"splunk.hec.ack.enabled": "false"
}
Per my admins on the "other side", this is what is being sent from Kafka to my HEC:
{"destination":{"ip":"192.168.0.1","port":1235,"address":"192.168.0.1"},"ecs":{"version":"1.12.0"},"host":{"name":"ptm-nsm"},"fileset":{"name":"eve"},"input":{"type":"log"},"suricata":{"eve":{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"event_type":"http","flow_id":"1550457178752986","tx_id":0}},"service":{"type":"suricata"},"log":{"offset":1125537802,"file":{"path":"/opt/suricata/eve.json"}},"network.direction":"external","source":{"ip":"192.168.0.1","port":38394,"address":"192.168.0.1"},"@timestamp":"2022-05-06T09:59:09.246Z","agent":{"hostname":"ptm-nsm","ephemeral_id":"dd64db01","id":"422ff9","name":"ptm-nsm","type":"filebeat","version":"7.16.2"},"tags":["iap","suricata"],"@version":"1","event":{"created":"2022-05-06T09:59:09.632Z","module":"suricata","dataset":"suricata.eve","original":{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"dest_port":1235,"proto":"TCP","src_port":38394,"dest_ip":"192.168.0.1","event_type":"http","timestamp":"2022-05-06T09:59:09.246372+0000","flow_id":1550457178752986,"src_ip":"192.168.0.1","tx_id":0}},"network":{"transport":"TCP","community_id":"1:Mbl3VcTAk="}}
When I testing this with the next configurations:
transforms.conf
[extractMessage]
REGEX = "original":(.+?})},
DEST_KEY= _raw
FORMAT = $1
WRITE_META = true
props.conf
[source::http:iap-suricata-dev]
LINE_BREAKER = (`~!\^<)
SHOULD_LINEMERGE = false
TRANSFORMS-also = extractMessage
inputs.conf
[http://iap-suricata-dev]
disabled = 0
host = xxxxx
index = splunk_test
indexes = splunk_test
token = eae66351-a931-4be1-83fa-2787781f501f
with cURL
curl -vkH "Authorization: Splunk eae66351-a931-4be1-83fa-2787781f501f" https://localhost:8088/services/collector/raw?channel=1-2-3-4 -d '{"host":"myhost", "sourcetype":"my_st", "event":{"destination":{"ip":"192.168.0.1","port":1235,"address":"192.168.0.1"},"ecs":{"version":"1.12.0"},"host":{"name":"ptm-nsm"},"fileset":{"name":"eve"},"input":{"type":"log"},"suricata":{"eve":{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"event_type":"http","flow_id":"1550457178752986","tx_id":0}},"service":{"type":"suricata"},"log":{"offset":1125537802,"file":{"path":"/opt/suricata/eve.json"}},"network.direction":"external","source":{"ip":"192.168.0.1","port":38394,"address":"192.168.0.1"},"@timestamp":"2022-05-06T09:59:09.246Z","agent":{"hostname":"ptm-nsm","ephemeral_id":"dd64db01","id":"422ff9","name":"ptm-nsm","type":"filebeat","version":"7.16.2"},"tags":["iap","suricata"],"@version":"1","event":{"created":"2022-05-06T09:59:09.632Z","module":"suricata","dataset":"suricata.eve","original":{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"dest_port":1235,"proto":"TCP","src_port":38394,"dest_ip":"192.168.0.1","event_type":"http","timestamp":"2022-05-06T09:59:09.246372+0000","flow_id":1550457178752986,"src_ip":"192.168.0.1","tx_id":0}},"network":{"transport":"TCP","community_id":"1:Mbl3VcTAk="}}}'
It works correctly and event contains only
{"http":{"http_method":"OPTIONS","url":"/","length":0,"protocol":"HTTP/1.0"},"dest_port":1235,"proto":"TCP","src_port":38394,"dest_ip":"192.168.0.1","event_type":"http","timestamp":"2022-05-06T09:59:09.246372+0000","flow_id":1550457178752986,"src_ip":"192.168.0.1","tx_id":0}
All this has done on single node instance.
Could it be that you have distributed environment and you haven't deploy those configurations on all HEC nodes?
It is a single node deployment for me as well, and your curl example parses perfectly.
So, something is inside the original message that does not make it into the _raw and I cannot see it on the source side and is breaking the regex.
More research:
I moved my regex up to the beginning of the message, trying to filter out anything that shows up mid-message that might break it.
"ecs":(.+?})}
....and it still doesnt match. BUT i did notice something. It looks like a lot of messages are indeed matching, but not all of them. When I copy the failed messages and regex101 or put them in via curl, they work fine. So, the plot thickens.
1) Regex101 and splunk search time extractions process 100% of my logs
2) Splunk transforms processes 80 - 90% of my logs
It also looks like
| spath output=actualEvent path=event.original
does exactly what I need, but at search time. All I need it to discard all data but event.original, and then index that.
@oliverja - Do you want to just keep that part (original) as your _raw event and remove everything at index time?
1. I'm making this assumption because you used TRANSFORMS in props.conf and you used DEST_KEY=_raw in the transforms.conf stanza.
# Add WRITE_META parameter in your transforms.conf stanza
[extractMessage]
REGEX = "original":(.+?})},
DEST_KEY= _raw
FORMAT = $1
WRITE_META = true
2. Or your goal is to extract a new index-time field?
3. Or do you want to just extract a new field, not necessarily index-time or search-time?
I hope this helps!!!
Updated transforms with WRITE_META, still no change.
[extractMessage]
REGEX = "original":(.+?})},
DEST_KEY= _raw
FORMAT = $1
WRITE_META = true
And in answer to your question, I want to discard all data except the "original" section, and make that my whole message.
"original" is the actual original message, the rest of the info is just a json wrapper from another tool.
@oliverja - Everything else looks okay.
Just to make sure you need to deploy this configuration on the first full Splunk instance. Heavy Forwarder or Indexers. UF will not process TRANSFORMS.
If you are confused about where to deploy the configuration, you can put the configuration everywhere.
Single instance of Splunk, so all configs are "everywhere".
I tested with a basic regex (outlined above) and it worked, so I have to assume it is my search, not my config.
@oliverja - Just to be sure is this the source you are receiving data in?
[source::http:kafka_iap-suricata-log]
For sure. Inputs:
[http://kafka_iap-suricata-log]
disabled = 0
index = ids-suricata-ext
token = tokenyNumbersGoHere
sourcetype = suricata
Try adding WRITE_META = true to the props.conf transforms.conf stanza.
@richgalloway - you mean transforms.conf stanza.
Yes, I do. Thanks.