Hello,
I have some issues with field extraction since value pair and non-value pair fields are within the same event. Not sure how implement Regex to extract these fields. A few sample events are giving below. Value pair (with Underline) and non-value pair (in Bold and values separated by space) have been marked for one of the sample events. Any recommendation will be highly appreciated. Thank you.
[2023-04-25 07:43:23,923] INFO signin 2055ddf870d6un9d1 6567bfb signIn SUCCESS user:bn4bfb monitorId:2056dhf40d6b9d1 IPaddr:15.218.61.1 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"64b9ib" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:44:01,520] INFO signin 009012cf0cce64c7 rmk9ddb signIn SUCCESS user:o0glddb monitorId:00amki2cf0cce6c7 IPaddr:15.198.2.35 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/101.0.1661.41" userDescription:"ugdi8db" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:45:13,632] INFO signin b9660cc3afe54c2 j56lb signIn SUCCESS user:j79lb monitorId:bop9060cc3afe54c2 IPaddr:10.209.23.194 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"jw908b" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io
[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows
Yes, getting developers to appreciate how well-structured and relevant logging reduces time-to-resolve and other metrics, which can only benefit your end users.
If you estimate how much time your support engineers spend on chasing down issues by analysing poorly-structured logs, you might be able to build a business case for getting improvements to the logging done by your developers.
Without a clear definition of how to split up the data into fields I am guessing, but try this
\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)(\s+(?<field10>\S+)\s+(?<field11>\w+:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?
Hello,
Thank you so much for your quick response, truly appreciate it. That REGEX is working as expected for non-value pair fields, but not working for value pair fields, see the screenshot below. Like ipaddr (in Group feld10), shouldn't be part of the extracted field value. Also, more issues with Group 10, ipaddr. Is there anything we can do to resolve this? Thank you again.
What is there to resolve? Have you tried this in Splunk?
Group 10 is the capture group for the optional part of the events (your examples show that not all the fields are present in all events). The named group field10 shows this field extracted. if you want this extracted to a different named field without the field name included, change the regex accordingly
\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)(\s+IPaddr:(?<IPaddr>\S+)\s+(?<field11>\w+:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?
Hello @ITWhisperer ,
Thank you so much for the clarification and totally agree. But here are 2 things:
1. Group Field10 should be extracted Like Group ipaddr:15.198.2.35, shouldn't be Group Field10: IPaddr:15.198.2.35; and
2. Please see the following 2 events, these are also part of that and don't contain all the parameters as we had before:
2023-04-25 07:46:09,358] INFO signin 0903gt268c6b7e9d58 jw0k95lb signOut SUCCESS user:09aswjlb monitorId:587a9098c6b7easd5io
[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6mkap0bb" browser:Chrome(111) os:windows
-Any recommendations?
I recommend you understand your data and describe it in a complete manner. For what you have now said (if I understand correctly), is that the ipaddr field is sometimes present and sometimes not? (There is a confusion here about the case used in the field name since you have used different versions and case usually matters in regex although it can be switched off.)
\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)((\s+IPaddr:(?<IPaddr>\S+))?\s+(?<field11>userAgent:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?
Hello @ITWhisperer ,
Thank you so much again. But I am sending you some of the events again and if you see the events 4 and 6 (in bold) if different than other sample events. Any recommendation on how to incorporate your REGEX with those structures along with other events.
[2023-04-25 07:43:23,923] INFO signin 2055ddf870d6un9d1 6567bfb signIn SUCCESS user:bn4bfb monitorId:2056dhf40d6b9d1 IPaddr:15.218.61.1 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"64b9ib" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:44:01,520] INFO signin 009012cf0cce64c7 rmk9ddb signIn SUCCESS user:o0glddb monitorId:00amki2cf0cce6c7 IPaddr:15.198.2.35 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/101.0.1661.41" userDescription:"ugdi8db" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:45:13,632] INFO signin b9660cc3afe54c2 j56lb signIn SUCCESS user:j79lb monitorId:bop9060cc3afe54c2 IPaddr:10.209.23.194 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"jw908b" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io
[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows
2023-04-25 07:46:19,358] INFO signin 0903sdgt268c6b7e9d58 jw0k95lb signOut SUCCESS user:08aaswjlb monitorId:587a9098c6b7easd89asio
[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6mkap0bb" browser:Chrome(111) os:windows
I think they are already incorporated - having said that, you should paste the events into a code block </> because pasting as normal text as you have done means formatting e.g. white spaces can be lost and the regex may not work if the same events are not accurate in this regard. Have you tried the regex against these events?
Hello @ITWhisperer ,
Thank you again. So, you are saying.
\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)((\s+IPaddr:(?<IPaddr>\S+))?\s+(?<field11>userAgent:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?
will cover all of the following events as those 3 events are within the same file.
[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io
[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6mkap0bb" browser:Chrome(111) os:windows
Yes, the IPaddr-anchored group is also optional and it relies on userAgent being the anchor in the next capture group
Hello @ITWhisperer ,
Sometimes other fields (and data) are missing in some cases (not the only fields/data I mentioned in my last message), do you think your REGEX can handle that as well?
As I said previously, you need to understand your data completely. It is not possible for me to say if the regex will work for all instances of your data since you haven't shared all instances. If you can describe your data in terms of patterns, you may be able to use regex to extract the data you want. I suggest you try the regex on as large a data set as possible to see if there are any events for which the regex doesn't work.
I already provided a few events though, but not working on them as expected. Thank you so much again, appreciated for all of your efforts.
[2023-04-25 07:43:23,923] INFO signin 2055ddf870d6un9d1 6567bfb signIn SUCCESS user:bn4bfb monitorId:2056dhf40d6b9d1 IPaddr:15.218.61.1 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"64b9ib" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:44:01,520] INFO signin 009012cf0cce64c7 rmk9ddb signIn SUCCESS user:o0glddb monitorId:00amki2cf0cce6c7 IPaddr:15.198.2.35 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/101.0.1661.41" userDescription:"ugdi8db" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:45:13,632] INFO signin b9660cc3afe54c2 j56lb signIn SUCCESS user:j79lb monitorId:bop9060cc3afe54c2 IPaddr:10.209.23.194 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"jw908b" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io
[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows
2023-04-25 07:46:19,358] INFO signin 0903sdgt268c6b7e9d58 jw0k95lb signOut SUCCESS user:08aaswjlb monitorId:587a9098c6b7easd89asio
[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6mkap0bb" browser:Chrome(111) os:windows
Here is a runanywhere example using your logs (with the correction to the line with the missing [), showing the field extractions working
| makeresults
| fields - _time
| eval _raw="[2023-04-25 07:43:23,923] INFO signin 2055ddf870d6un9d1 6567bfb signIn SUCCESS user:bn4bfb monitorId:2056dhf40d6b9d1 IPaddr:15.218.61.1 userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41\" userDescription:\"64b9ib\" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:44:01,520] INFO signin 009012cf0cce64c7 rmk9ddb signIn SUCCESS user:o0glddb monitorId:00amki2cf0cce6c7 IPaddr:15.198.2.35 userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/101.0.1661.41\" userDescription:\"ugdi8db\" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:45:13,632] INFO signin b9660cc3afe54c2 j56lb signIn SUCCESS user:j79lb monitorId:bop9060cc3afe54c2 IPaddr:10.209.23.194 userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41\" userDescription:\"jw908b\" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io
[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41\" userDescription:\"6op0bb\" sessionType:STANDARD browser:Chrome(111) os:windows
[2023-04-25 07:46:19,358] INFO signin 0903sdgt268c6b7e9d58 jw0k95lb signOut SUCCESS user:08aaswjlb monitorId:587a9098c6b7easd89asio
[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41\" userDescription:\"6mkap0bb\" browser:Chrome(111) os:windows"
| multikv noheader=t
| fields _raw
| rex "\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)((\s+IPaddr:(?<IPaddr>\S+))?\s+(?<field11>userAgent:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?"
Hello @ITWhisperer,
WOW!!! You have done lots of hard work, really appreciate it. We are getting hundreds of thousands of events. I think, based on our discussions for the last days, we may need to reach out developers and let me know that SPLUNK needs some uniformities in these events structure to get them parsed/ingested. These events are really a little ill defined. Thank so much you again.
Yes, getting developers to appreciate how well-structured and relevant logging reduces time-to-resolve and other metrics, which can only benefit your end users.
If you estimate how much time your support engineers spend on chasing down issues by analysing poorly-structured logs, you might be able to build a business case for getting improvements to the logging done by your developers.
Hello @ITWhisperer
Thank you again. So, you are saying.
\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)((\s+IPaddr:(?<IPaddr>\S+))?\s+(?<field11>userAgent:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?
will cover both of the following events as those 2 events are within the same file.
[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io
[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows
Yes - after field9 match there is an "extra" group (group 10) which covers the rest of the match groups and is optional.
Just to add one more thing, all of the sample events I provided are within the same files.