Splunk Search

My regular expression is working fine but why is my search not retrieving fields?

premraj_vs
Path Finder

Hi All,

I am a newbie and i am trying to extract fields from raw log. I followed the below steps.

  1. Using the link -https://regex101.com/

I created the regex expression matching my log.

Regex Expression is as follows

(?P<DP_Date_Time>\w+\s+\d+\s+\d+\s+\d+:\d+:\d+)\s+(?P<DP_Error_Code>[[^ ]\w+])\[\w+]\[\w+]\s(?P<DP_Service_Name>\w+\(\w+\)):\s+\w+\((?P<DP_Transaction_ID>\d+)\)\[(?P<DP_TID>\d+.\d+.\d+.\d+)\]\s+\w+\((?P<DP_GTID>\d+)\):\s+Latency:\s+(?P<DP_LATENCY_TIME_REQ_HDR_READ>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_REQ_HDR_SENT>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FSTB>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FSTC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_ENTIRE_REQ_TRS>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FS_SYTLE_READY>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FS_PARSING_COM>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_HDR_RECVD>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_HDR_SENT>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSTB>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSTC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_TRS>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BS_STYLE_READ>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSPC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSCA>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSCC>[0-9]*) \[(?P<DP_Backside_URL>.*)\]

2) Now when i imported the log into Splunk, i selected Default source type and imported it.

3) I am trying with below search query and it returns no fields.

source="latency_0612.log" host="******" index="idx-integrations-test" sourcetype="dpower-latency" | rex _raw="^(?P\w+\s+\d+\s+\d+\s+\d+:\d+:\d+)\s+(?P[[^ ]\w+])\[\w+]\[\w+]\s(?P\w+\(\w+\)):\s+\w+\((?P\d+)\)\[(?P\d+.\d+.\d+.\d+)\]\s+\w+\((?P\d+)\):\s+Latency:\s+(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*) \[(?P.*)\]"

What am i missing ? Why after search i am not seeing these fields ?

0 Karma
1 Solution

premraj_vs
Path Finder

Finally it worked.

I had to use Field Extractions option and use the same regex expression there. It returned me all the fields with correct values.

Thanks for all the help.

View solution in original post

0 Karma

DalJeanis
Legend

Looks like the only thing you are missing is something to get rid of the day of the week from the beginning, and use the proper syntax for rex. Oh, one other thing. You can make splunk's job easier if you do not use [ ]* for the spaces between numbers. Explanation after the code.

 | rex field=_raw "^\w+\s+(?P<DP_Date_Time>\w+\s+\d+\s+\d+\s+\d+:\d+:\d+)\s+(?P<DP_Error_Code>[[^ ]\w+])\[\w+]\[\w+]\s(?P<DP_Service_Name>\w+\(\w+\)):\s+\w+\((?P<DP_Transaction_ID>\d+)\)\[(?P<DP_TID>\d+.\d+.\d+.\d+)\]\s+\w+\((?P<DP_GTID>\d+)\):\s+Latency:\s+(?P<DP_LATENCY_TIME_REQ_HDR_READ>\d+)\s+(?P<DP_LATENCY_TIME_REQ_HDR_SENT>\d+)\s+(?P<DP_LATENCY_TIME_FSTB>\d+)\s+(?P<DP_LATENCY_TIME_FSTC>\d+)\s+(?P<DP_LATENCY_TIME_ENTIRE_REQ_TRS>\d+)\s+(?P<DP_LATENCY_TIME_FS_SYTLE_READY>\d+)\s+(?P<DP_LATENCY_TIME_FS_PARSING_COM>\d+)\s+(?P<DP_LATENCY_TIME_RES_HDR_RECVD>\d+)\s+(?P<DP_LATENCY_TIME_RES_HDR_SENT>\d+)\s+(?P<DP_LATENCY_TIME_BSTB>\d+)\s+(?P<DP_LATENCY_TIME_BSTC>\d+)\s+(?P<DP_LATENCY_TIME_RES_TRS>\d+)\s+(?P<DP_LATENCY_TIME_BS_STYLE_READ>\d+)\s+(?P<DP_LATENCY_TIME_BSPC>\d+)\s+(?P<DP_LATENCY_TIME_BSCA>\d+)\s+(?P<DP_LATENCY_TIME_BSCC>\d+) \[(?P<DP_Backside_URL>.*)\]"

Remember that * matches ZERO of something. With the spaces after numbers, that means that [0-9]*[ ]*[0-9]* matches a zero-length string, as well as an uncountable number of substrings of any succession of digits and spaces.

With this tiny piece of string...

243 254

the regex (?[0-9])(?[ ])(?[0-9]*) will match roughly 4*3^3 different ways, including...

1) the zero-length string before the first character where item1, space2 and item3 are all empty
2) the 1-length string "2" that has item1 and space2 empty and item3 as "2".
3) the 1-length string "2" that has item1 as "2" and space2 and item2 empty.
4) the 2-length string "24" that has item1 and space2 empty and item3 as "24".
5) the 2-length string "24" that has item1 as "2" and space2 and item3 as "4".
6) the 2-length string "24" that has item1 as "24" and space2 and item3 empty .

Since you were using the greedy *, those alternatives will not get tested until after the version where item1 gets "243" and item3 gets "254", so you will be okay as long as the overall pattern matches. However, the minute that your overall pattern somehow fails, your search is going bye-bye with way too many potential backtracks to ever come back from.

This is easily solved, because In each of these cases, you want one or more digits, and one or more spaces, so you can use + instead, so there are zero potential backtracks.

To see this in action, take your original rex string, go over to regex101, and plop it in the tester. Copy your sample into the test string box and see the match was found in 144 steps or so.

Now add some bad data late in the event - for example change one of the 36 to 36U. Up above to the right, after a short while, you will see the words "catastrophic backtracking". Now copy our version of the rex up there, and the message will instead be that it failed with no match after perhaps 136 steps.

0 Karma

premraj_vs
Path Finder

Finally it worked.

I had to use Field Extractions option and use the same regex expression there. It returned me all the fields with correct values.

Thanks for all the help.

0 Karma

woodcock
Esteemed Legend

Please do elaborate with steps; I am not sure what you mean here.

0 Karma

premraj_vs
Path Finder

Sample Event

Thu Apr 20 2017 13:42:09 [0x80e00073][latency][info] mpgw(ORD_Gateway_Policy_02): tid(134637607)[YY:UU:UU:OO] gtid(134637607): Latency: 0 36 0 36 36 32 24 243 254 243 254 255 251 243 36 36 [https://XX.XX.XX.XX:10005/services/ORD/v2]

0 Karma

horsefez
Motivator

Hi,

  1. Could you provide a sample event?
  2. The correct rex syntax is | rex field=_raw "yourregex"
  3. your fields do not have a field name to it

    do something like (?<field-name>...) instead of (?P...)

0 Karma

DalJeanis
Legend

Yes, per point 2, it looks like premraj_vs is mixing the syntax for rex and regex.

0 Karma

premraj_vs
Path Finder

Added sample event

0 Karma

horsefez
Motivator

Hi premraj_vs,

I don't know if I get it correctly, but how about using your first Regex-Statement in the query?

Like...

source="latency_0612.log" host="******" index="idx-integrations-test" sourcetype="dpower-latency" | rex field=_raw "(?P<DP_Date_Time>\w+\s+\d+\s+\d+\s+\d+:\d+:\d+)\s+(?P<DP_Error_Code>[[^ ]\w+])\[\w+]\[\w+]\s(?P<DP_Service_Name>\w+\(\w+\)):\s+\w+\((?P<DP_Transaction_ID>\d+)\)\[(?P<DP_TID>\d+.\d+.\d+.\d+)\]\s+\w+\((?P<DP_GTID>\d+)\):\s+Latency:\s+(?P<DP_LATENCY_TIME_REQ_HDR_READ>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_REQ_HDR_SENT>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FSTB>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FSTC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_ENTIRE_REQ_TRS>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FS_SYTLE_READY>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FS_PARSING_COM>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_HDR_RECVD>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_HDR_SENT>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSTB>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSTC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_TRS>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BS_STYLE_READ>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSPC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSCA>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSCC>[0-9]*) \[(?P<DP_Backside_URL>.*)\]"
0 Karma

premraj_vs
Path Finder

I am doing that already

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...