Hello clever people,
Would anyone be able to help me build a regex that would work on a SPL level e.g something like
| rex mode=sed field=_raw s/regex_example/g
I wanted to test the result first before I add to props on the indexers.
The below is the raw log and I would like to keep just the parts in bold all the rest should be dropped/cleared.
-----------------------------------------------------
[meta sequenceId="-2077347367"]10000 - [action:"Accept"; conn_direction:"Internal"; flags:"dd06212"; ifdir:"inbound"; ifname:"bond3.32"; logid:"0"; loguid:"{ 000.000.000.000}"; origin:"000.000.000.000"; originsicname:"CN=XXXXXXXX,O= XXXXXXXX. XXXXXXXX.q7vvv"; sequencenum:"1457"; time:"1686217674"; version:"5"; __policy_id_tag:"product=cccccccc-1[db_tag={ XXXXXXXX-8ED31 XXXXXXXX };mgmt= XXXXXXXX xxx1;date=168XXXXXXXX;policy_name=XXXXXXXX-1\]"; dst:"000.000.000.000"; log_delay:"168XXXXXXXX "; layer_name:" XXXXXXXX "; layer_name:" XXXXXXXX "; layer_uuid:" XXXXXXXX -49d7-a207-a90ea5dd66fb"; layer_uuid:"cdc569c2-d869- XXXXXXXX "; match_id:"14x"; match_id:"50331649"; parent_rule:"0"; parent_rule:"0"; rule_action:"Accept"; rule_action:"Accept"; rule_name:" XXXXXXXX Heartbeat -> Platfxxxx"; rule_name:" XXXXXXXX "; rule_uid:"211567a0-d33a- XXXXXXXX "; rule_uid:" XXXXXXXX -4bde-a9c0-3cbaefd188b6"; product:" XXXXXXXX "; proto:"6"; s_port:" XXXXXXXX "; service:"3002"; service_id:"xxxx-Control"; src:"000.000.000.000"]
-----------------------------------------------
Thank you all in advance!
You need to escape the double quotes in the string you are setting _raw to
| makeresults
| eval _raw="[meta sequenceId=\"-2077347367\"]10000 - [action:\"Accept\"; conn_direction:\"Internal\"; flags:\"dd06212\"; ifdir:\"inbound\"; ifname:\"bond3.32\"; logid:\"0\"; loguid:\"{ 000.000.000.000}\"; origin:\"000.000.000.000\"; originsicname:\"CN=XXXXXXXX,O= XXXXXXXX. XXXXXXXX.q7vvv\"; sequencenum:\"1457\"; time:\"1686217674\"; version:\"5\"; __policy_id_tag:\"product=cccccccc-1[db_tag={ XXXXXXXX-8ED31 XXXXXXXX };mgmt= XXXXXXXX xxx1;date=168XXXXXXXX;policy_name=XXXXXXXX-1\]\"; dst:\"000.000.000.000\"; log_delay:\"168XXXXXXXX \"; layer_name:\" XXXXXXXX \"; layer_name:\" XXXXXXXX \"; layer_uuid:\" XXXXXXXX -49d7-a207-a90ea5dd66fb\"; layer_uuid:\"cdc569c2-d869- XXXXXXXX \"; match_id:\"14x\"; match_id:\"50331649\"; parent_rule:\"0\"; parent_rule:\"0\"; rule_action:\"Accept\"; rule_action:\"Accept\"; rule_name:\" XXXXXXXX Heartbeat -> Platfxxxx\"; rule_name:\" XXXXXXXX \"; rule_uid:\"211567a0-d33a- XXXXXXXX \"; rule_uid:\" XXXXXXXX -4bde-a9c0-3cbaefd188b6\"; product:\" XXXXXXXX \"; proto:\"6\"; s_port:\" XXXXXXXX \"; service:\"3002\"; service_id:\"xxxx-Control\"; src:\"000.000.000.000\"]"
| rex mode=sed "s/.*\[(?<action>action:\"[^\"]+\").+(?<origin>origin:\"[^\"]+\").+(?<dst>dst:\"[^\"]+\").+(?<layer_name>layer_name:\"[^\"]+\").+(?<src>src:\"[^\"]+\").*/\1 \2 \3 \4 \5/g"
| rex mode=sed "s/.*\[(?<action>action:\"[^\"]+\").+(?<origin>origin:\"[^\"]+\").+(?<dst>dst:\"[^\"]+\").+(?<layer_name>layer_name:\"[^\"]+\").+(?<src>src:\"[^\"]+\").*/\1 \2 \3 \4 \5/g"
Hi @ITWhisperer ,
Hope you are doing well.
I wanted to ask you as you were able to help me once and wanted to see if you would be able to help me with my new challenge, please.
My original post is in Re: Help with SEDCMD raw event size reduction - Splunk Community
Thank you in advance.
Thanks for your reply @ITWhisperer
Unfortunately, it gives me an error Unknown search command `db` after I run the following:
| makeresults
| eval _raw = "[meta sequenceId="-2077347367"]10000 - [action:"Accept"; conn_direction:"Internal"; flags:"dd06212"; ifdir:"inbound"; ifname:"bond3.32"; logid:"0"; loguid:"{ 000.000.000.000}"; origin:"000.000.000.000"; originsicname:"CN=XXXXXXXX,O= XXXXXXXX. XXXXXXXX.q7vvv"; sequencenum:"1457"; time:"1686217674"; version:"5"; __policy_id_tag:"product=cccccccc-1[db_tag={ XXXXXXXX-8ED31 XXXXXXXX };mgmt= XXXXXXXX xxx1;date=168XXXXXXXX;policy_name=XXXXXXXX-1\]"; dst:"000.000.000.000"; log_delay:"168XXXXXXXX "; layer_name:" XXXXXXXX "; layer_name:" XXXXXXXX "; layer_uuid:" XXXXXXXX -49d7-a207-a90ea5dd66fb"; layer_uuid:"cdc569c2-d869- XXXXXXXX "; match_id:"14x"; match_id:"50331649"; parent_rule:"0"; parent_rule:"0"; rule_action:"Accept"; rule_action:"Accept"; rule_name:" XXXXXXXX Heartbeat -> Platfxxxx"; rule_name:" XXXXXXXX "; rule_uid:"211567a0-d33a- XXXXXXXX "; rule_uid:" XXXXXXXX -4bde-a9c0-3cbaefd188b6"; product:" XXXXXXXX "; proto:"6"; s_port:" XXXXXXXX "; service:"3002"; service_id:"xxxx-Control"; src:"000.000.000.000"]"
| rex mode=sed "s/.*\[(?<action>action:\"[^\"]+\").+(?<origin>origin:\"[^\"]+\").+(?<dst>dst:\"[^\"]+\").+(?<layer_name>layer_name:\"[^\"]+\").+(?<src>src:\"[^\"]+\").*/\1 \2 \3 \4 \5/g"
Would it be possible to provide all used to test, please?
Thank you!
You need to escape the double quotes in the string you are setting _raw to
| makeresults
| eval _raw="[meta sequenceId=\"-2077347367\"]10000 - [action:\"Accept\"; conn_direction:\"Internal\"; flags:\"dd06212\"; ifdir:\"inbound\"; ifname:\"bond3.32\"; logid:\"0\"; loguid:\"{ 000.000.000.000}\"; origin:\"000.000.000.000\"; originsicname:\"CN=XXXXXXXX,O= XXXXXXXX. XXXXXXXX.q7vvv\"; sequencenum:\"1457\"; time:\"1686217674\"; version:\"5\"; __policy_id_tag:\"product=cccccccc-1[db_tag={ XXXXXXXX-8ED31 XXXXXXXX };mgmt= XXXXXXXX xxx1;date=168XXXXXXXX;policy_name=XXXXXXXX-1\]\"; dst:\"000.000.000.000\"; log_delay:\"168XXXXXXXX \"; layer_name:\" XXXXXXXX \"; layer_name:\" XXXXXXXX \"; layer_uuid:\" XXXXXXXX -49d7-a207-a90ea5dd66fb\"; layer_uuid:\"cdc569c2-d869- XXXXXXXX \"; match_id:\"14x\"; match_id:\"50331649\"; parent_rule:\"0\"; parent_rule:\"0\"; rule_action:\"Accept\"; rule_action:\"Accept\"; rule_name:\" XXXXXXXX Heartbeat -> Platfxxxx\"; rule_name:\" XXXXXXXX \"; rule_uid:\"211567a0-d33a- XXXXXXXX \"; rule_uid:\" XXXXXXXX -4bde-a9c0-3cbaefd188b6\"; product:\" XXXXXXXX \"; proto:\"6\"; s_port:\" XXXXXXXX \"; service:\"3002\"; service_id:\"xxxx-Control\"; src:\"000.000.000.000\"]"
| rex mode=sed "s/.*\[(?<action>action:\"[^\"]+\").+(?<origin>origin:\"[^\"]+\").+(?<dst>dst:\"[^\"]+\").+(?<layer_name>layer_name:\"[^\"]+\").+(?<src>src:\"[^\"]+\").*/\1 \2 \3 \4 \5/g"
Thanks again @ITWhisperer
I just wanted to confirm that this is going to be the result after applying
SEDCMD remove_unwanted_parts_from_raw_event=s/.*\[(?<action>action:\"[^\"]+\").+(?<origin>origin:\"[^\"]+\").+(?<dst>dst:\"[^\"]+\").+(?<layer_name>layer_name:\"[^\"]+\").+(?<src>src:\"[^\"]+\").*/\1 \2 \3 \4 \5/g
action:"Accept" origin:"000.000.000.000" dst:"000.000.000.000" layer_name:" XXXXXXXX " src:"000.000.000.000"
I just wanted to make sure the regex will extract the values I want to keep (above) and all the rest will be dropped before it gets indexed on the indexers and not the other way around?
Thank you!
That's why I suggested you check it in testing environment. At first glance it seems that you're trying to escape too much in your SEDCMD. While some characters will work the same way even if unnecessarily escaped, others may not.
To be honest, I don't know for certain, but I think it should work. I don't usually get involved with the ingestion side of things. As @PickleRick suggests, you should test it before rolling it out to your production environment.
1. Use regex101.com for testing your regexes.
2. Test in pre-prod environment, test on mockup data and send to temporary index.
3. Testing using regex SPL commands might lead to confusion sometimes since you have to escape your regex to "fit" into a string.
Thanks for the reply @PickleRick
"3. Testing using regex SPL commands might lead to confusion sometimes since you have to escape your regex to "fit" into a string."
Would it be possible to provide any practical examples, please? Apologies, I cannot fully understand.
Thank you!
Normally if you want to perform - for example
s/"/|/g
You type it literarily in the SEDCMD definition
But if you want to use SPL, you have to escape the quotation mark so that doesn't end the string containing the regex. So it becomes
"s/\"/|/g"
And that's the simplest example. If you have multiple quotes and some backslashes in your regex, that might get messy and "disarming" all those escapes to get proper regex definition for SEDCMD might cause additional mistakes.
I get this now. Thank you @PickleRick
I might create separate SEDCMD entries to avoid confusion and keep it simple?