Splunk Search

Why does my regular expression provide inconsistent results for my field extraction?

clesto
Explorer

I'm attempting to set up a Field Extraction for a log files we're forwarding from an LDAP server. For the most part it works, but for some reason it seems to be extracting data on subsequent lines even though everything I've checked on regex101.com etc all show it should stop at the end of the line. I'm trying to extract all characters following the word "Outcome: " in the log file. It seems like some of the events, this appears in the middle and when it does, it continues to extract into the next line.

Here's the regex "Outcome:\s(?P.*)"

Working data (snipped/cleansed)
When: 2017-03-16 14:51:46-0700
Measure: 0.000000
Actor: uid=xxxxxxx
Impersonator: -
ClientAddress: xxxxx
Session: xxxxx
AuthServer: xxxxxx
AppServer: -
ProxyServer: -
AgentAddress: xxxxxxxx
Interface: api
MoreInfo: xxxxxx
Event: identity/logout/passexpire
TargetObject: -
SecondaryTarget: -
Outcome: success

This matches the word "success" and that's it.

NOT Working Data
When: 2017-03-15 14:01:59-0700
Measure: 0.015000
Actor: xxxx
Impersonator: -
ClientAddress: xxxxx
Session: xxxx
AuthServer: xxxxx
AppServer: -
ProxyServer: -
AgentAddress: xxxxx
Interface: api
MoreInfo: "Role: base"
Event: identity/password/get
TargetObject: xxxxx
SecondaryTarget: -
Outcome: success
When: 2017-03-15 14:01:59
Measure: 0.016000
Actor: xxxxx
Impersonator: -
ClientAddress: xxx

This matches "success When: 2017-03-15 14:01:59 Measure: 0.016000 Actor: xxxxx Impersonator: - ClientAddress: xxx".... and everything else after it

I realize it looks like the single event is actually multiple events recorded as one event. I'm not exactly worried about that right now. Is there a way to get it to stop matching at the end of the line instead of continuing on? From everything I've read .* is not supposed to match line terminators/new line

0 Karma
1 Solution

clesto
Explorer

Well I finally got some regex that worked. Not sure why it worked, but all I did was add a $ at the end and it's working correctly now.

In fact, I had LOADS of problems with this input. Trying to troubleshoot this lead me to trying to figure out why the event was actually multiple events in one event, which when I figured that out I then had to figure out why the times between the splunk event and the log event were off. Anyways, after troubleshooting all day I was able to fix all of the issues.

The final regex that worked was

Outcome:\s(?P<snare_outcome>.*?)$

If someone who is much more regex savvy than myself could possibly explain why this worked that would be nice. And I'm far from regex savvy.

View solution in original post

0 Karma

clesto
Explorer

Well I finally got some regex that worked. Not sure why it worked, but all I did was add a $ at the end and it's working correctly now.

In fact, I had LOADS of problems with this input. Trying to troubleshoot this lead me to trying to figure out why the event was actually multiple events in one event, which when I figured that out I then had to figure out why the times between the splunk event and the log event were off. Anyways, after troubleshooting all day I was able to fix all of the issues.

The final regex that worked was

Outcome:\s(?P<snare_outcome>.*?)$

If someone who is much more regex savvy than myself could possibly explain why this worked that would be nice. And I'm far from regex savvy.

0 Karma

aaraneta_splunk
Splunk Employee
Splunk Employee

@clesto - Did your answer provide a working solution to your question? If yes and you would like to close out your post, don't forget to click "Accept". But if you'd like to keep it open for possibilities of other answers/comments, you don't have to take action on it yet.

0 Karma

clesto
Explorer

Strange, I didn't notice that when I posted it. It must have gotten stripped somehow. Here's the regex

Outcome:\s(?P<snare_outcome>.*)

Also I wanted to mention there are other "outcome" possibilities. Other outcomes are also:
denial
failure
denial: excessive failures
denial: invalid credentials
failure: DCE error: fetch_acl Key not found in database (dce / lib)
And there could possibly be others

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

What you have said is all sound logic about your reg exp. It should stop at the end of the line when you do .*, but there may be some other reason it is continuing on, like perhaps there is a return but not a newline (\r but not \n). DOS/Win uses \r\n for end of line and most everyone else uses \n for end of line. If you just have \r, then it may not be ending the line, though I have not personally seen this happen. Check your original data that is going in. If you are using Linux, you can use the od utility to check. For example:

od -c file.log

which will spit out the characters found. If there is a return without a newline it will look something like:

$ od -c file.log
0000000    O   u   t   c   o   m   e   :       s   u   c   c   e   s   s
0000020   \r   m   o   r   e       d   a   t   a  \n  \n
0000034

If your data looks something like the above example, that might be the cause. Since there is not a good way to clean data and post it here, you may be on your own doing the deep investigating of the data. But, from what you describe in your question, I'm surprised you are getting the results you are, but then again, looking at your original data will be the place to start.

0 Karma

clesto
Explorer

Strange, I'm not seeing any \r in the logs anywhere. At the end of where each line would be is a \n. Also I don't notice anything specifically different between the logged data when it regularly occurs to when the script causes the logging to happen.

When I open the log in standard windows notepad it's just constant run on lines. When I open the log in Notepad++ it looks correct and if I turn on all characters it just shows a single LF at the end of each line. Using od I only see the \n at the end of the line. Although if I'm reading the log correctly, it almost looks like the way splunk is indexing each event is a little...off.

In the log, each entry or logged event is formatted similar to this:
\n
Event: xxx\n
TargetObject: xxx\n
SecondaryObject: xxx\n
Outcome: xxx\n
When: xxx\n
Measure: xxx\n
Actor: xxx\n
Impersonator: xxx\n
ClientAddress: xxx\n
Session: xxx\n
AuthServer: xxx\n
AppServer: xxx\n
ProxyServer: xxx\n
AgentAddress: xxx\n
Interface: xxx\n
MoreInfo: xxx\n
\n

Not every Entry has all of those fields.

On the Splunk server they look like this:
When:
Measure:
Actor:
Impersonator:
ClientAddress:
Session:
AuthServer:
AppServer:
ProxyServer:
AgentAddress:
Interface:
MoreInfo:
Event:
TargetObject:
SecondaryTarget:
Outcome:

So it seems splunk isn't exactly parsing the data correctly.

0 Karma

clesto
Explorer

That makes a whole lot of sense. These logs are generated by an LDAP server running on Windows 2008 R2. I also noticed that this only seems to happen when a certain script is run to grab the passwords of some of the accounts within the LDAP server. For the heck of it I clicked on one of the run on results in Splunk and told it to search on that. Sure enough there was some form of hard return/carriage return it automatically plugged into the search field. If I did a search without the carriage return and just a space, it wouldn't find the entries, but if I put the carriage return back in, it would find the entries.

I have access to some linux systems. Might wind up copying the log to one and check it out.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Make sure your question has all the characters it needs coming through. It seems that there are a few characters missing (like between the Outcome:\s(?P and .*) in your regular expression Outcome:\s(?P.*)). At least that is what I'm seeing.

0 Karma

clesto
Explorer

Strange I didn't notice that when I posted. Some of the code must have gotten stripped. Here's the regex
Outcome:\s(?P.*)
Hopefully it woks this time.
Also possible outcomes are:
denial: excessive failures
denial
denial: invalid credentials
failure: DCE error: fetch_acl Key not found in database (dce / lib)
and possibly others

0 Karma

clesto
Explorer

Still got stripped. Adding some spaces to see if that helps. I tried clicking the code button and adding it there, but it didn't help.

Spaces added between the ?P and < snare_outcome > and .*

Outcome:\s(?P < snare_outcome > .*)

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...