Splunk Search

Regex a field into more fields

packet_hunter
Contributor

For some reason the builtin field extractor is not working for me, and I am unable to successful create a .conf stanza to parse out some needed fields from ADFS logs. So I have an extracted field called Message that contains all the information to create the new fields I need.

Sample events are:

The following user account has been locked out due to too many bad password attempts. Additional Data Activity ID: 00000000-0000-0000-0000-000000000000 User: someone@ibm.com Client IP: 129.42.38.7,192.168.2.13 nBad Password Count: 6 nLast Bad Password Attempt: 1/8/2017 

The following user account has been locked out due to too many bad password attempts. Additional Data Activity ID: 00000000-0000-0000-0000-000000000000 User: ibm-9\1234 Client IP: 192.168.2.13 nBad Password Count: 6 nLast Bad Password Attempt: 1/9/2017 

The two events are similar except for User value and Client IP

What I would like to do is rex out all the information into

Msg = The following user account has been locked out due to too many bad password attempts.
Activity_ID= 00000000-0000-0000-0000-000000000000
Employee= someone
OR
Employee= 1234
Client_IP= 129.42.38.7,192.168.2.13
OR
Client_IP=192.168.2.13
Bad_Password_Count = 6
Last_Bad_Password = 1/8/2017

Here is my initial query

index=wineventlog sourcetype="WinEventLog:Security"  EventCode=516 | rex field=Message "(?<Employee>.+)@" | rex field=Message "(?<Msg>.+)." |table  Msg Employee _time

As you can I am using an already extracted field, to get Msg and Employee. I just need a regex Ninja to show me how to slice this up.

Thank you

BTW why do expressions in regex101 editor not work in the search app (and vice versa)?? Is there a tutorial on the differences?

Tags (1)
0 Karma
1 Solution

DalJeanis
Legend
| rex "^(?<Msg>.+?)\s+Additional Data" 
| rex "Activity ID:\s+(?<Activity_ID>[-0-9]+)\s" 
| rex "User:\s+(?<Employee>.+?)\s+Client IP:\s+(?<Client_IP>[\.0-9,\s]+?)\s+nBad")
| rex field=Employee "^(?<Employee>.+)@" 
| rex field=Employee "\\(?<Employee>.+)$" 
| rex "Bad Password Count:\s+(?<Bad_Password_Count>\d+)" 
| rex "Last Bad Password Attempt:\s+(?<Last_Bad_Password>[0-9\\]+)" 

To answer your questions about splunk vs regex101, it takes a bit of getting used to what to escape. In general, you are NOT escaping everything in regex101 that you need to escape in splunk.

So, as you can see above, I don't try to do everything in one pass, I break the whole message up into reasonable chunks. That is because if any one part of a regex fails it all fails, so I'd rather keep it local.

I don't assume that there will always be only one space after the colon in the data, so that's why I have \s+ in various spots.

When pulling a chunk of data, if I know the data type well enough to make a list of what are valid characters, then I will do so, so that the regular expression can slurp them up and stop when it gets to the invalid ones. For example, Client_IP should consist of 0-9, period, comma, and maybe an occasional space if it came in with a space after the comma. I put a question mark after the plus so that it will be lazy; if the regex encounters a space that isn't part of the IP section, then the space will be left to the chunk after it.

View solution in original post

0 Karma

DalJeanis
Legend
| rex "^(?<Msg>.+?)\s+Additional Data" 
| rex "Activity ID:\s+(?<Activity_ID>[-0-9]+)\s" 
| rex "User:\s+(?<Employee>.+?)\s+Client IP:\s+(?<Client_IP>[\.0-9,\s]+?)\s+nBad")
| rex field=Employee "^(?<Employee>.+)@" 
| rex field=Employee "\\(?<Employee>.+)$" 
| rex "Bad Password Count:\s+(?<Bad_Password_Count>\d+)" 
| rex "Last Bad Password Attempt:\s+(?<Last_Bad_Password>[0-9\\]+)" 

To answer your questions about splunk vs regex101, it takes a bit of getting used to what to escape. In general, you are NOT escaping everything in regex101 that you need to escape in splunk.

So, as you can see above, I don't try to do everything in one pass, I break the whole message up into reasonable chunks. That is because if any one part of a regex fails it all fails, so I'd rather keep it local.

I don't assume that there will always be only one space after the colon in the data, so that's why I have \s+ in various spots.

When pulling a chunk of data, if I know the data type well enough to make a list of what are valid characters, then I will do so, so that the regular expression can slurp them up and stop when it gets to the invalid ones. For example, Client_IP should consist of 0-9, period, comma, and maybe an occasional space if it came in with a space after the comma. I put a question mark after the plus so that it will be lazy; if the regex encounters a space that isn't part of the IP section, then the space will be left to the chunk after it.

0 Karma

packet_hunter
Contributor

Thank you for the responses. I appreciate your explanation of regex101 and the rex examples.
Just fyi, I had to rework some of the rex expressions but your examples helped me trigger some memories.
Here is what I finally came up with if anyone is interested.

index=wineventlog 

sourcetype="WinEventLog:Security"  

EventCode=516 

| rex field=Message "(?<employee>.+)@" 

|rex field=Message "\\\\(?<employee>.+)"

| rex field=Message "^(?<Msg>.+)"

| rex field=Message "Activity ID:\s+(?<Activity_ID>[-0-9]+)\s" 

| rex field=Message "Bad Password Count:\s+(?<Bad_Pswd_Count>\d+)"

| rex field=Message "Last Bad Password Attempt:\s+(?<Last_Bad_Pswd>[0-9\\\\].+)" 

|rex field=Message "Client IP:\s+(?<Client_IP>[\.0-9,\s]+?)\s+nBad" 

|table employee Msg Activity_ID Bad_Pswd_Count Last_Bad_Pswd Client_IP
0 Karma

DalJeanis
Legend

Your <employee> lines both presume that there will only ever be an @ or a \ in that field, never anywhere else. Is that a valid assumption? Also, no, those won't work. It looks like the @ version will end up reading back to the beginning, since a period will match all the characters, and the \ version will read to the end for the same reason.

Try these:

 | rex field=Message "User:\s+(?[^@]+)@" 
 | rex field=Message "User:[^\\]*\\\\(?\S+)"

Your <msg> line will eat up the entire message until a "carriage return" or end of file. Okay?

0 Karma

packet_hunter
Contributor

The employee identifier will only be either User: someone@company.com or User: company-9\1234 and I am only concerned with "someone" or "1234" respectively.

I am not sure about the format of your rex expressions, perhaps you wrote them in the free regex101 editor. But my do work in the Search App.

Thank you

0 Karma
Get Updates on the Splunk Community!

Data Management Digest – November 2025

  Welcome to the inaugural edition of Data Management Digest! As your trusted partner in data innovation, the ...

Splunk Mobile: Your Brand-New Home Screen

Meet Your New Mobile Hub  Hello Splunk Community!  Staying connected to your data—no matter where you are—is ...

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Real progress on your strategic priorities starts with knowing the business outcomes your teams are delivering ...