Hi All,
I am a newbie and i am trying to extract fields from raw log. I followed the below steps.
I created the regex expression matching my log.
Regex Expression is as follows
(?P<DP_Date_Time>\w+\s+\d+\s+\d+\s+\d+:\d+:\d+)\s+(?P<DP_Error_Code>[[^ ]\w+])\[\w+]\[\w+]\s(?P<DP_Service_Name>\w+\(\w+\)):\s+\w+\((?P<DP_Transaction_ID>\d+)\)\[(?P<DP_TID>\d+.\d+.\d+.\d+)\]\s+\w+\((?P<DP_GTID>\d+)\):\s+Latency:\s+(?P<DP_LATENCY_TIME_REQ_HDR_READ>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_REQ_HDR_SENT>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FSTB>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FSTC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_ENTIRE_REQ_TRS>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FS_SYTLE_READY>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FS_PARSING_COM>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_HDR_RECVD>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_HDR_SENT>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSTB>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSTC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_TRS>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BS_STYLE_READ>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSPC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSCA>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSCC>[0-9]*) \[(?P<DP_Backside_URL>.*)\]
2) Now when i imported the log into Splunk, i selected Default source type and imported it.
3) I am trying with below search query and it returns no fields.
source="latency_0612.log" host="******" index="idx-integrations-test" sourcetype="dpower-latency" | rex _raw="^(?P\w+\s+\d+\s+\d+\s+\d+:\d+:\d+)\s+(?P[[^ ]\w+])\[\w+]\[\w+]\s(?P\w+\(\w+\)):\s+\w+\((?P\d+)\)\[(?P\d+.\d+.\d+.\d+)\]\s+\w+\((?P\d+)\):\s+Latency:\s+(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*)[ ]*(?P[0-9]*) \[(?P.*)\]"
What am i missing ? Why after search i am not seeing these fields ?
Finally it worked.
I had to use Field Extractions option and use the same regex expression there. It returned me all the fields with correct values.
Thanks for all the help.
Looks like the only thing you are missing is something to get rid of the day of the week from the beginning, and use the proper syntax for rex. Oh, one other thing. You can make splunk's job easier if you do not use [ ]*
for the spaces between numbers. Explanation after the code.
| rex field=_raw "^\w+\s+(?P<DP_Date_Time>\w+\s+\d+\s+\d+\s+\d+:\d+:\d+)\s+(?P<DP_Error_Code>[[^ ]\w+])\[\w+]\[\w+]\s(?P<DP_Service_Name>\w+\(\w+\)):\s+\w+\((?P<DP_Transaction_ID>\d+)\)\[(?P<DP_TID>\d+.\d+.\d+.\d+)\]\s+\w+\((?P<DP_GTID>\d+)\):\s+Latency:\s+(?P<DP_LATENCY_TIME_REQ_HDR_READ>\d+)\s+(?P<DP_LATENCY_TIME_REQ_HDR_SENT>\d+)\s+(?P<DP_LATENCY_TIME_FSTB>\d+)\s+(?P<DP_LATENCY_TIME_FSTC>\d+)\s+(?P<DP_LATENCY_TIME_ENTIRE_REQ_TRS>\d+)\s+(?P<DP_LATENCY_TIME_FS_SYTLE_READY>\d+)\s+(?P<DP_LATENCY_TIME_FS_PARSING_COM>\d+)\s+(?P<DP_LATENCY_TIME_RES_HDR_RECVD>\d+)\s+(?P<DP_LATENCY_TIME_RES_HDR_SENT>\d+)\s+(?P<DP_LATENCY_TIME_BSTB>\d+)\s+(?P<DP_LATENCY_TIME_BSTC>\d+)\s+(?P<DP_LATENCY_TIME_RES_TRS>\d+)\s+(?P<DP_LATENCY_TIME_BS_STYLE_READ>\d+)\s+(?P<DP_LATENCY_TIME_BSPC>\d+)\s+(?P<DP_LATENCY_TIME_BSCA>\d+)\s+(?P<DP_LATENCY_TIME_BSCC>\d+) \[(?P<DP_Backside_URL>.*)\]"
Remember that *
matches ZERO of something. With the spaces after numbers, that means that [0-9]*[ ]*[0-9]*
matches a zero-length string, as well as an uncountable number of substrings of any succession of digits and spaces.
With this tiny piece of string...
243 254
the regex (?[0-9])(?[ ])(?[0-9]*) will match roughly 4*3^3 different ways, including...
1) the zero-length string before the first character where item1, space2 and item3 are all empty
2) the 1-length string "2" that has item1 and space2 empty and item3 as "2".
3) the 1-length string "2" that has item1 as "2" and space2 and item2 empty.
4) the 2-length string "24" that has item1 and space2 empty and item3 as "24".
5) the 2-length string "24" that has item1 as "2" and space2 and item3 as "4".
6) the 2-length string "24" that has item1 as "24" and space2 and item3 empty .
Since you were using the greedy *
, those alternatives will not get tested until after the version where item1 gets "243" and item3 gets "254", so you will be okay as long as the overall pattern matches. However, the minute that your overall pattern somehow fails, your search is going bye-bye with way too many potential backtracks to ever come back from.
This is easily solved, because In each of these cases, you want one or more digits, and one or more spaces, so you can use + instead, so there are zero potential backtracks.
To see this in action, take your original rex string, go over to regex101, and plop it in the tester. Copy your sample into the test string box and see the match was found in 144 steps or so.
Now add some bad data late in the event - for example change one of the 36 to 36U. Up above to the right, after a short while, you will see the words "catastrophic backtracking". Now copy our version of the rex up there, and the message will instead be that it failed with no match after perhaps 136 steps.
Finally it worked.
I had to use Field Extractions option and use the same regex expression there. It returned me all the fields with correct values.
Thanks for all the help.
Please do elaborate with steps; I am not sure what you mean here.
Sample Event
Thu Apr 20 2017 13:42:09 [0x80e00073][latency][info] mpgw(ORD_Gateway_Policy_02): tid(134637607)[YY:UU:UU:OO] gtid(134637607): Latency: 0 36 0 36 36 32 24 243 254 243 254 255 251 243 36 36 [https://XX.XX.XX.XX:10005/services/ORD/v2]
Hi,
your fields do not have a field name to it
do something like (?<field-name>...) instead of (?P...)
Yes, per point 2, it looks like premraj_vs is mixing the syntax for rex
and regex
.
Added sample event
Hi premraj_vs,
I don't know if I get it correctly, but how about using your first Regex-Statement in the query?
Like...
source="latency_0612.log" host="******" index="idx-integrations-test" sourcetype="dpower-latency" | rex field=_raw "(?P<DP_Date_Time>\w+\s+\d+\s+\d+\s+\d+:\d+:\d+)\s+(?P<DP_Error_Code>[[^ ]\w+])\[\w+]\[\w+]\s(?P<DP_Service_Name>\w+\(\w+\)):\s+\w+\((?P<DP_Transaction_ID>\d+)\)\[(?P<DP_TID>\d+.\d+.\d+.\d+)\]\s+\w+\((?P<DP_GTID>\d+)\):\s+Latency:\s+(?P<DP_LATENCY_TIME_REQ_HDR_READ>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_REQ_HDR_SENT>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FSTB>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FSTC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_ENTIRE_REQ_TRS>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FS_SYTLE_READY>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_FS_PARSING_COM>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_HDR_RECVD>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_HDR_SENT>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSTB>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSTC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_RES_TRS>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BS_STYLE_READ>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSPC>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSCA>[0-9]*)[ ]*(?P<DP_LATENCY_TIME_BSCC>[0-9]*) \[(?P<DP_Backside_URL>.*)\]"
I am doing that already