Security

How to extract fields from the Unstructured Events?

SplunkDash
Motivator

Hello,

I have some issues with field extraction since value pair and non-value pair fields are within the same event. Not sure how implement Regex to extract these fields. A few sample events are giving below. Value pair (with Underline) and non-value pair (in Bold and values separated by space) have been marked for one of the sample events. Any recommendation will be highly appreciated. Thank you.

[2023-04-25 07:43:23,923] INFO  signin           2055ddf870d6un9d1  6567bfb signIn SUCCESS user:bn4bfb monitorId:2056dhf40d6b9d1 IPaddr:15.218.61.1 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"64b9ib" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:44:01,520] INFO  signin           009012cf0cce64c7  rmk9ddb signIn SUCCESS user:o0glddb monitorId:00amki2cf0cce6c7 IPaddr:15.198.2.35 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/101.0.1661.41" userDescription:"ugdi8db" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:45:13,632] INFO  signin           b9660cc3afe54c2  j56lb signIn SUCCESS user:j79lb monitorId:bop9060cc3afe54c2 IPaddr:10.209.23.194 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"jw908b" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:46:09,358] INFO  signin           0904c268c6b7e9d58  jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io

[2023-04-25 07:46:47,077] INFO  signin           ee2bop9853a5623c  65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows

 

Labels (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Yes, getting developers to appreciate how well-structured and relevant logging reduces time-to-resolve and other metrics, which can only benefit your end users.

If you estimate how much time your support engineers spend on chasing down issues by analysing poorly-structured logs, you might be able to build a business case for getting improvements to the logging done by your developers.

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Without a clear definition of how to split up the data into fields I am guessing, but try this

\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)(\s+(?<field10>\S+)\s+(?<field11>\w+:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?

https://regex101.com/r/gA0f59/1

0 Karma

SplunkDash
Motivator

Hello,

Thank you so much for your quick response, truly appreciate it. That REGEX is working as expected for non-value pair fields, but not working for value pair fields, see the screenshot below. Like  ipaddr (in Group feld10), shouldn't be part of the extracted field value. Also, more issues with Group 10,  ipaddr. Is there anything we can do to resolve this? Thank you again.

 

SplunkDash_0-1690516361318.png

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

What is there to resolve? Have you tried this in Splunk?

Group 10 is the capture group for the optional part of the events (your examples show that not all the fields are present in all events). The named group field10 shows this field extracted. if you want this extracted to a different named field without the field name included, change the regex accordingly

\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)(\s+IPaddr:(?<IPaddr>\S+)\s+(?<field11>\w+:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?
0 Karma

SplunkDash
Motivator

Hello @ITWhisperer ,

 

Thank you so much for the clarification and totally agree. But here are 2 things:

1.  Group Field10 should be extracted Like Group ipaddr:15.198.2.35, shouldn't be Group Field10: IPaddr:15.198.2.35;  and

2.  Please see the following 2 events, these are also part of that and don't contain all the parameters as we had before: 

2023-04-25 07:46:09,358] INFO  signin           0903gt268c6b7e9d58  jw0k95lb signOut SUCCESS user:09aswjlb monitorId:587a9098c6b7easd5io

[2023-04-25 07:46:47,077] INFO  signin     ee2nmp9853a5A623c  65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6mkap0bb"  browser:Chrome(111) os:windows

 

-Any recommendations?

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

I recommend you understand your data and describe it in a complete manner. For what you have now said (if I understand correctly), is that the ipaddr field is sometimes present and sometimes not? (There is a confusion here about the case used in the field name since you have used different versions and case usually matters in regex although it can be switched off.)

\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)((\s+IPaddr:(?<IPaddr>\S+))?\s+(?<field11>userAgent:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?

SplunkDash
Motivator

Hello @ITWhisperer ,

Thank you so much again. But I am sending you some of the events again and if you see the events 4 and 6 (in bold) if different than other sample events. Any recommendation on how to incorporate your REGEX with those structures along with other events.

[2023-04-25 07:43:23,923] INFO signin 2055ddf870d6un9d1 6567bfb signIn SUCCESS user:bn4bfb monitorId:2056dhf40d6b9d1 IPaddr:15.218.61.1 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"64b9ib" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:44:01,520] INFO signin 009012cf0cce64c7 rmk9ddb signIn SUCCESS user:o0glddb monitorId:00amki2cf0cce6c7 IPaddr:15.198.2.35 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/101.0.1661.41" userDescription:"ugdi8db" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:45:13,632] INFO signin b9660cc3afe54c2 j56lb signIn SUCCESS user:j79lb monitorId:bop9060cc3afe54c2 IPaddr:10.209.23.194 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"jw908b" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io

[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows

2023-04-25 07:46:19,358] INFO signin 0903sdgt268c6b7e9d58 jw0k95lb signOut SUCCESS user:08aaswjlb monitorId:587a9098c6b7easd89asio

[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6mkap0bb" browser:Chrome(111) os:windows

 

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

I think they are already incorporated - having said that, you should paste the events into a code block </> because pasting as normal text as you have done means formatting e.g. white spaces can be lost and the regex may not work if the same events are not accurate in this regard. Have you tried the regex against these events?

SplunkDash
Motivator

Hello @ITWhisperer ,

Thank you again. So, you are saying.

\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)((\s+IPaddr:(?<IPaddr>\S+))?\s+(?<field11>userAgent:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?

will cover all of the following events as those 3 events are within the same file.

[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io

[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6mkap0bb" browser:Chrome(111) os:windows

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Yes, the IPaddr-anchored group is also optional and it relies on userAgent being the anchor in the next capture group

SplunkDash
Motivator

Hello @ITWhisperer ,

Sometimes other fields (and data) are missing in some cases (not the only fields/data I mentioned in my last message), do you think your REGEX can handle that as well?

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

As I said previously, you need to understand your data completely. It is not possible for me to say if the regex will work for all instances of your data since you haven't shared all instances. If you can describe your data in terms of patterns, you may be able to use regex to extract the data you want. I suggest you try the regex on as large a data set as possible to see if there are any events for which the regex doesn't work.

SplunkDash
Motivator

@ITWhisperer 

I already provided a few events though, but not working on them as expected. Thank you so much again, appreciated for all of your efforts.

[2023-04-25 07:43:23,923] INFO signin 2055ddf870d6un9d1 6567bfb signIn SUCCESS user:bn4bfb monitorId:2056dhf40d6b9d1 IPaddr:15.218.61.1 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"64b9ib" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:44:01,520] INFO signin 009012cf0cce64c7 rmk9ddb signIn SUCCESS user:o0glddb monitorId:00amki2cf0cce6c7 IPaddr:15.198.2.35 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/101.0.1661.41" userDescription:"ugdi8db" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:45:13,632] INFO signin b9660cc3afe54c2 j56lb signIn SUCCESS user:j79lb monitorId:bop9060cc3afe54c2 IPaddr:10.209.23.194 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"jw908b" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io

[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows

2023-04-25 07:46:19,358] INFO signin 0903sdgt268c6b7e9d58 jw0k95lb signOut SUCCESS user:08aaswjlb monitorId:587a9098c6b7easd89asio

[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6mkap0bb" browser:Chrome(111) os:windows

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Here is a runanywhere example using your logs (with the correction to the line with the missing [), showing the field extractions working

| makeresults
| fields - _time
| eval _raw="[2023-04-25 07:43:23,923] INFO signin 2055ddf870d6un9d1 6567bfb signIn SUCCESS user:bn4bfb monitorId:2056dhf40d6b9d1 IPaddr:15.218.61.1 userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41\" userDescription:\"64b9ib\" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:44:01,520] INFO signin 009012cf0cce64c7 rmk9ddb signIn SUCCESS user:o0glddb monitorId:00amki2cf0cce6c7 IPaddr:15.198.2.35 userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/101.0.1661.41\" userDescription:\"ugdi8db\" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:45:13,632] INFO signin b9660cc3afe54c2 j56lb signIn SUCCESS user:j79lb monitorId:bop9060cc3afe54c2 IPaddr:10.209.23.194 userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41\" userDescription:\"jw908b\" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io

[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41\" userDescription:\"6op0bb\" sessionType:STANDARD browser:Chrome(111) os:windows

[2023-04-25 07:46:19,358] INFO signin 0903sdgt268c6b7e9d58 jw0k95lb signOut SUCCESS user:08aaswjlb monitorId:587a9098c6b7easd89asio

[2023-04-25 07:46:47,077] INFO signin ee2nmp9853a5A623c 65nAha9b signIn SUCCESS user:6mkB0bb monitorId:ee2klnmaa53a562op userAgent:\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41\" userDescription:\"6mkap0bb\" browser:Chrome(111) os:windows"
| multikv noheader=t 
| fields _raw
| rex "\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)((\s+IPaddr:(?<IPaddr>\S+))?\s+(?<field11>userAgent:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?"
0 Karma

SplunkDash
Motivator

Hello @ITWhisperer,

WOW!!! You have done lots of hard work, really appreciate it. We are getting hundreds of thousands of events. I think, based on our discussions for the last days, we may need to reach out developers and let me know that SPLUNK needs some uniformities in these events structure to get them parsed/ingested.   These events are really a little ill defined. Thank so much you again.

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Yes, getting developers to appreciate how well-structured and relevant logging reduces time-to-resolve and other metrics, which can only benefit your end users.

If you estimate how much time your support engineers spend on chasing down issues by analysing poorly-structured logs, you might be able to build a business case for getting improvements to the logging done by your developers.

SplunkDash
Motivator

Hello @ITWhisperer 

Thank you again. So, you are saying.

\[(?<field1>[^\]]+)\]\s+(?<field2>\S+)\s+(?<field3>\S+)\s+(?<field4>\S+)\s+(?<field5>\S+)\s+(?<field6>\S+)\s+(?<field7>\S+)\s+(?<field8>\S+)\s+(?<field9>\S+)((\s+IPaddr:(?<IPaddr>\S+))?\s+(?<field11>userAgent:\"[^\"]+\")\s+(?<field12>\S+)\s+(?<field13>\S+)\s+(?<field14>\S+)(\s+(?<field15>\S+))?)?

will cover both of the following events as those 2 events are within the same file.

[2023-04-25 07:46:09,358] INFO signin 0904c268c6b7e9d58 jw095lb signOut SUCCESS user:090wjlb monitorId:59c9098c6b7e9d5io

[2023-04-25 07:46:47,077] INFO signin ee2bop9853a5623c 65co9b signIn SUCCESS user:6op0bb monitorId:ee2klo853a562op IPaddr:10.54.190.56 userAgent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.41" userDescription:"6op0bb" sessionType:STANDARD browser:Chrome(111) os:windows

 

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Yes - after field9 match there is an "extra" group (group 10) which covers the rest of the match groups and is optional.

SplunkDash
Motivator

@ITWhisperer 

Just to add one more thing, all of the sample events I provided are within the same files.

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...