topic Re: [Regex/Extraction] Need help finding the correct method of parsing a specific log type in Splunk Search

[Regex/Extraction] Need help finding the correct method of parsing a specific log type

rbechtold — Thu, 18 Apr 2019 23:02:53 GMT

Instead of trying to explain, It would be easier to show you the problem I am having. The Splunk search below will give you two example anonymized logs that I am trying to parse correctly and entirely:

| makeresults count=2 
| streamstats count 
| eval _raw = if(count=1,"f=?q?<bounces+51612-7668-random=53user=3Dqilsadkjerwqs.com@email.eb-notifications.com>: t=<random_user@idwgdzfctcbgmzk.com> Rule=?q?Globally_Allowed_Senders type=Providencia b=ok action=deliver scot=242 PROBLEM-FIELD=HELP extract this field entirely(1) don_data=?q?255.255.255.255;bounces+321200-4020-hob=1Adagu=2Rzoipxoantxhnonw.com@email.eb-notifications.com;q2.email.eb-notifications.com p=0.025 S=?q?COY_REPORTS_Has_Created_a_New_Item_in_HvsulQjoc fur=255.255.255.255 r=255.255.255.255 pz=4.20 a=a/art", "t=<random_user@rigjgaxwiaizady.com> PROBLEM-FIELD=HELP extract this field entirely(2) Rule=?q?Arnita_Sargita_Sender_IP S=Oj: fur=255.255.255.255") 
| fields - _time count

I am using the following regex to try to extract the fields:

 | rex field=_raw max_match=0 "\s?(?P<test>[A-Za-z\-\_]+\=[^\s]+)"

The problem I am having is specifically with the "PROBLEM-FIELD" in both logs. Extracted fully, the PROBLEM-FIELD/value pair should be:

PROBLEM-FIELD=HELP extract this field entirely(1)
but it is showing up as:
PROBLEM-FIELD=HELP

because there are spaces in the PROBLEM-FIELD value, unlike the other fields in the data.

Originally I tried to use the extract command with kvdelim="=" pairdelim=" ", but because there are equal signs(=) within some of the field's values, it doesn't work. If anyone has any ideas on how to parse this log with any method, without losing any data, please help!

Non-essential bonus question: Is there a way to use the extract command with this data, without using mvexpand? The method below will work if a regex is found that will extract the PROBLEM-FIELD correctly, but I lose all the other fields I'm working with when I have to use stats to join the fields back together (not to mention it is terribly inefficient and ugly):

| makeresults count=2 
| streamstats count 
| eval _raw = if(count=1,"f=?q?<bounces+51612-7668-random=53user=3Dqilsadkjerwqs.com@email.eb-notifications.com>: t=<random_user@idwgdzfctcbgmzk.com> Rule=?q?Globally_Allowed_Senders type=Providencia b=ok action=deliver scot=242 PROBLEM-FIELD=HELP extract this field entirely(1) don_data=?q?255.255.255.255;bounces+321200-4020-hob=1Adagu=2Rzoipxoantxhnonw.com@email.eb-notifications.com;q2.email.eb-notifications.com p=0.025 S=?q?COY_REPORTS_Has_Created_a_New_Item_in_HvsulQjoc fur=255.255.255.255 r=255.255.255.255 pz=4.20 a=a/art", "t=<random_user@rigjgaxwiaizady.com> PROBLEM-FIELD=HELP extract this field entirely(2) Rule=?q?Arnita_Sargita_Sender_IP S=Oj: fur=255.255.255.255") 
| fields - _time count 
| streamstats count AS log_recompiler 
| rex field=_raw max_match=0 "\s?(?P<test>[A-Za-z\-\_]+\=[^\s]+)" 
| mvexpand test 
| rex field=test "(?P<field>[^\=]+\=)(?P<value>.*)" 
| rex mode=sed field=field "s/=/~/g" 
| eval newfield = mvzip(field,value) 
| stats list(newfield) AS _raw by log_recompiler 
| eval _raw = toString(_raw) 
| rex field=_raw mode=sed "s/=/|||/g" 
| extract kvdelim="~," pairdelim=" " 
| foreach * 
    [ rex field=<<FIELD>> mode=sed "s/\|\|\|/=/g"] 
| fields - _raw

Re: [Regex/Extraction] Need help finding the correct method of parsing a specific log type

woodcock — Tue, 23 Apr 2019 05:18:42 GMT

Like this:

... | rex field=_raw max_match=0 "\s?(?P<test>[A-Za-z\-\_]+\=.*?)(?=\s+[^\s=]+=|$)"

Re: [Regex/Extraction] Need help finding the correct method of parsing a specific log type

rbechtold — Thu, 25 Apr 2019 17:56:21 GMT

You're incredible! It took me a few minutes to wrap my mind around how the extraction works, but you translated my problem perfectly into regex. I have a lot to learn when it comes to forwards/backwards lookups. Thank you so much!

In the event anyone runs across this in the future and is curious about the second part of my question, I've been able to figure it out using this method:

| makeresults count=2 
| streamstats count 
| eval _raw = if(count=1,"f=?q?: t= Rule=?q?Globally_Allowed_Senders type=Providencia b=ok action=deliver scot=242 PROBLEM-FIELD=HELP extract this field entirely(1) don_data=?q?255.255.255.255;bounces+321200-4020-hob=1Adagu=2Rzoipxoantxhnonw.com@email.eb-notifications.com;q2.email.eb-notifications.com p=0.025 S=?q?COY_REPORTS_Has_Created_a_New_Item_in_HvsulQjoc fur=255.255.255.255 r=255.255.255.255 pz=4.20 a=a/art", "t= PROBLEM-FIELD=HELP extract this field entirely(2) Rule=?q?Arnita_Sargita_Sender_IP S=Oj: fur=255.255.255.255") 
| fields - _time count 
| rex field=_raw max_match=0 "\s?(?P<test>[A-Za-z\-\_]+\=.*?)(?=\s+[^\s=]+=|$)" 
| rex field=test max_match=0 "(?P<field1>[^\=]+)\=(?P<field2>.*)" 
| eval field1 = mvjoin(field1,","), field2 = mvjoin(field2,"~,") 
| eval field1 = split(field1, ","), field2 = split(field2, ",") 
| rename _raw AS tempraw 
| eval _raw = mvzip(field1, field2) 
| rex field=_raw mode=sed "s/=/|||/g" 
| extract kvdelim="," pairdelim="~" mv_add=t 
| foreach * 
    [ rename <<FIELD>> AS <<FIELD>>_temp 
    | rex field=<<FIELD>>_temp mode=sed "s/\|\|\|/=/g" 
    | rename <<FIELD>>_temp AS <<FIELD>>] 
| fields - field1 field2 test 
| rename tempraw AS _raw

Thank you again Woodcock.