<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why does my regular expression provide inconsistent results for my field extraction? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359376#M106249</link>
    <description>&lt;P&gt;Strange, I'm not seeing any \r in the logs anywhere.  At the end of where each line would be is a \n.  Also I don't notice anything specifically different between the logged data when it regularly occurs to when the script causes the logging to happen.  &lt;/P&gt;

&lt;P&gt;When I open the log in standard windows notepad it's just constant run on lines.  When I open the log in Notepad++ it looks correct and if I turn on all characters it just shows a single LF at the end of each line.  Using od I only see the \n at the end of the line.  Although if I'm reading the log correctly, it almost looks like the way splunk is indexing each event is a little...off.  &lt;/P&gt;

&lt;P&gt;In the log, each entry or logged event is formatted similar to this:&lt;BR /&gt;
\n&lt;BR /&gt;
Event: xxx\n&lt;BR /&gt;
TargetObject: xxx\n&lt;BR /&gt;
SecondaryObject: xxx\n&lt;BR /&gt;
Outcome: xxx\n&lt;BR /&gt;
When: xxx\n&lt;BR /&gt;
Measure: xxx\n&lt;BR /&gt;
Actor: xxx\n&lt;BR /&gt;
Impersonator: xxx\n&lt;BR /&gt;
ClientAddress: xxx\n&lt;BR /&gt;
Session: xxx\n&lt;BR /&gt;
AuthServer: xxx\n&lt;BR /&gt;
AppServer: xxx\n&lt;BR /&gt;
ProxyServer: xxx\n&lt;BR /&gt;
AgentAddress: xxx\n&lt;BR /&gt;
Interface: xxx\n&lt;BR /&gt;
MoreInfo: xxx\n&lt;BR /&gt;
\n&lt;/P&gt;

&lt;P&gt;Not every Entry has all of those fields.&lt;/P&gt;

&lt;P&gt;On the Splunk server they look like this:&lt;BR /&gt;
When:&lt;BR /&gt;
Measure:&lt;BR /&gt;
Actor:&lt;BR /&gt;
Impersonator:&lt;BR /&gt;
ClientAddress:&lt;BR /&gt;
Session:&lt;BR /&gt;
AuthServer:&lt;BR /&gt;
AppServer:&lt;BR /&gt;
ProxyServer:&lt;BR /&gt;
AgentAddress:&lt;BR /&gt;
Interface:&lt;BR /&gt;
MoreInfo:&lt;BR /&gt;
Event:&lt;BR /&gt;
TargetObject:&lt;BR /&gt;
SecondaryTarget:&lt;BR /&gt;
Outcome:&lt;/P&gt;

&lt;P&gt;So it seems splunk isn't exactly parsing the data correctly.  &lt;/P&gt;</description>
    <pubDate>Fri, 17 Mar 2017 16:48:08 GMT</pubDate>
    <dc:creator>clesto</dc:creator>
    <dc:date>2017-03-17T16:48:08Z</dc:date>
    <item>
      <title>Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359369#M106242</link>
      <description>&lt;P&gt;I'm attempting to set up a Field Extraction for a log files we're forwarding from an LDAP server.  For the most part it works, but for some reason it seems to be extracting data on subsequent lines even though everything I've checked on regex101.com etc all show it should stop at the end of the line.  I'm trying to extract all characters following the word "Outcome: " in the log file.  It seems like some of the events, this appears in the middle and when it does, it continues to extract into the next line.  &lt;/P&gt;

&lt;P&gt;Here's the regex "Outcome:\s(?P.*)"&lt;/P&gt;

&lt;P&gt;Working data (snipped/cleansed)&lt;BR /&gt;
When: 2017-03-16 14:51:46-0700&lt;BR /&gt;
Measure: 0.000000&lt;BR /&gt;
Actor: uid=xxxxxxx&lt;BR /&gt;
Impersonator: -&lt;BR /&gt;
ClientAddress: xxxxx&lt;BR /&gt;
Session: xxxxx&lt;BR /&gt;
AuthServer: xxxxxx&lt;BR /&gt;
AppServer: -&lt;BR /&gt;
ProxyServer: -&lt;BR /&gt;
AgentAddress: xxxxxxxx&lt;BR /&gt;
Interface: api&lt;BR /&gt;
MoreInfo: xxxxxx&lt;BR /&gt;
Event: identity/logout/passexpire&lt;BR /&gt;
TargetObject: -&lt;BR /&gt;
SecondaryTarget: -&lt;BR /&gt;
Outcome: success&lt;/P&gt;

&lt;P&gt;This matches the word "success" and that's it.&lt;/P&gt;

&lt;P&gt;NOT Working Data&lt;BR /&gt;
When: 2017-03-15 14:01:59-0700&lt;BR /&gt;
Measure: 0.015000&lt;BR /&gt;
Actor: xxxx&lt;BR /&gt;
Impersonator: -&lt;BR /&gt;
ClientAddress: xxxxx&lt;BR /&gt;
Session: xxxx&lt;BR /&gt;
AuthServer: xxxxx&lt;BR /&gt;
AppServer: -&lt;BR /&gt;
ProxyServer: -&lt;BR /&gt;
AgentAddress: xxxxx&lt;BR /&gt;
Interface: api&lt;BR /&gt;
MoreInfo: "Role: base"&lt;BR /&gt;
Event: identity/password/get&lt;BR /&gt;
TargetObject: xxxxx&lt;BR /&gt;
SecondaryTarget: -&lt;BR /&gt;
Outcome: success&lt;BR /&gt;
When: 2017-03-15 14:01:59&lt;BR /&gt;
Measure: 0.016000&lt;BR /&gt;
Actor: xxxxx&lt;BR /&gt;
Impersonator: -&lt;BR /&gt;
ClientAddress: xxx&lt;/P&gt;

&lt;P&gt;This matches "success When: 2017-03-15 14:01:59 Measure: 0.016000 Actor: xxxxx Impersonator: - ClientAddress: xxx".... and everything else after it&lt;/P&gt;

&lt;P&gt;I realize it looks like the single event is actually multiple events recorded as one event.  I'm not exactly worried about that right now.  Is there a way to get it to stop matching at the end of the line instead of continuing on?  From everything I've read .* is not supposed to match line terminators/new line&lt;/P&gt;</description>
      <pubDate>Thu, 16 Mar 2017 22:03:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359369#M106242</guid>
      <dc:creator>clesto</dc:creator>
      <dc:date>2017-03-16T22:03:49Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359370#M106243</link>
      <description>&lt;P&gt;Make sure your question has all the characters it needs coming through. It seems that there are a few characters missing (like between the &lt;CODE&gt;Outcome:\s(?P&lt;/CODE&gt; and &lt;CODE&gt;.*)&lt;/CODE&gt; in your regular expression &lt;CODE&gt;Outcome:\s(?P.*)&lt;/CODE&gt;). At least that is what I'm seeing.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Mar 2017 22:56:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359370#M106243</guid>
      <dc:creator>cpetterborg</dc:creator>
      <dc:date>2017-03-16T22:56:54Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359371#M106244</link>
      <description>&lt;P&gt;Strange, I didn't notice that when I posted it.  It must have gotten stripped somehow.   Here's the regex&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Outcome:\s(?P&amp;lt;snare_outcome&amp;gt;.*)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Also I wanted to mention there are other "outcome" possibilities.  Other outcomes are also:&lt;BR /&gt;
denial&lt;BR /&gt;
failure&lt;BR /&gt;
denial: excessive failures&lt;BR /&gt;
denial: invalid credentials&lt;BR /&gt;
failure: DCE error: fetch_acl Key not found in database (dce / lib)&lt;BR /&gt;
And there could possibly be others&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 10:02:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359371#M106244</guid>
      <dc:creator>clesto</dc:creator>
      <dc:date>2017-03-17T10:02:03Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359372#M106245</link>
      <description>&lt;P&gt;Strange I didn't notice that when I posted.  Some of the code must have gotten stripped.  Here's the regex&lt;BR /&gt;
    Outcome:\s(?P&lt;SNARE_OUTCOME&gt;.*)&lt;BR /&gt;
Hopefully it woks this time.&lt;BR /&gt;
Also possible outcomes are:&lt;BR /&gt;
denial: excessive failures&lt;BR /&gt;
denial&lt;BR /&gt;
denial: invalid credentials&lt;BR /&gt;
failure: DCE error: fetch_acl Key not found in database (dce / lib)&lt;BR /&gt;
and possibly others&lt;/SNARE_OUTCOME&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 10:04:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359372#M106245</guid>
      <dc:creator>clesto</dc:creator>
      <dc:date>2017-03-17T10:04:43Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359373#M106246</link>
      <description>&lt;P&gt;Still got stripped.  Adding some spaces to see if that helps.  I tried clicking the code button and adding it there, but it didn't help.&lt;/P&gt;

&lt;P&gt;Spaces added between the ?P and &amp;lt; snare_outcome &amp;gt; and .*&lt;/P&gt;

&lt;P&gt;Outcome:\s(?P &amp;lt; snare_outcome &amp;gt; .*)&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 10:07:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359373#M106246</guid>
      <dc:creator>clesto</dc:creator>
      <dc:date>2017-03-17T10:07:29Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359374#M106247</link>
      <description>&lt;P&gt;What you have said is all sound logic about your reg exp. It should stop at the end of the line when you do &lt;CODE&gt;.*&lt;/CODE&gt;, but there may be some other reason it is continuing on, like perhaps there is a return but not a newline (&lt;CODE&gt;\r&lt;/CODE&gt; but not &lt;CODE&gt;\n&lt;/CODE&gt;). DOS/Win uses &lt;CODE&gt;\r\n&lt;/CODE&gt; for end of line and most everyone else uses &lt;CODE&gt;\n&lt;/CODE&gt; for end of line. If you &lt;STRONG&gt;just&lt;/STRONG&gt; have &lt;CODE&gt;\r&lt;/CODE&gt;, then it may not be ending the line, though I have not personally seen this happen. Check your original data that is going in. If you are using Linux, you can use the od utility to check. For example:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;od -c file.log
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;which will spit out the characters found. If there is a return without a newline it will look something like:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;$ od -c file.log
0000000    O   u   t   c   o   m   e   :       s   u   c   c   e   s   s
0000020   \r   m   o   r   e       d   a   t   a  \n  \n
0000034
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If your data looks something like the above example, that might be the cause. Since there is not a good way to clean data and post it here, you may be on your own doing the deep investigating of the data. But, from what you describe in your question, I'm surprised you are getting the results you are, but then again, looking at your &lt;STRONG&gt;original&lt;/STRONG&gt; data will be the place to start.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 14:27:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359374#M106247</guid>
      <dc:creator>cpetterborg</dc:creator>
      <dc:date>2017-03-17T14:27:39Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359375#M106248</link>
      <description>&lt;P&gt;That makes a whole lot of sense.  These logs are generated by an LDAP server running on Windows 2008 R2.  I also noticed that this only seems to happen when a certain script is run to grab the passwords of some of the accounts within the LDAP server.  For the heck of it I clicked on one of the run on results in Splunk and told it to search on that.  Sure enough there was some form of hard return/carriage return it automatically plugged into the search field.  If I did a search without the carriage return and just a space, it wouldn't find the entries, but if I put the carriage return back in, it would find the entries. &lt;/P&gt;

&lt;P&gt;I have access to some linux systems.  Might wind up copying the log to one and check it out.  &lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 15:32:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359375#M106248</guid>
      <dc:creator>clesto</dc:creator>
      <dc:date>2017-03-17T15:32:48Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359376#M106249</link>
      <description>&lt;P&gt;Strange, I'm not seeing any \r in the logs anywhere.  At the end of where each line would be is a \n.  Also I don't notice anything specifically different between the logged data when it regularly occurs to when the script causes the logging to happen.  &lt;/P&gt;

&lt;P&gt;When I open the log in standard windows notepad it's just constant run on lines.  When I open the log in Notepad++ it looks correct and if I turn on all characters it just shows a single LF at the end of each line.  Using od I only see the \n at the end of the line.  Although if I'm reading the log correctly, it almost looks like the way splunk is indexing each event is a little...off.  &lt;/P&gt;

&lt;P&gt;In the log, each entry or logged event is formatted similar to this:&lt;BR /&gt;
\n&lt;BR /&gt;
Event: xxx\n&lt;BR /&gt;
TargetObject: xxx\n&lt;BR /&gt;
SecondaryObject: xxx\n&lt;BR /&gt;
Outcome: xxx\n&lt;BR /&gt;
When: xxx\n&lt;BR /&gt;
Measure: xxx\n&lt;BR /&gt;
Actor: xxx\n&lt;BR /&gt;
Impersonator: xxx\n&lt;BR /&gt;
ClientAddress: xxx\n&lt;BR /&gt;
Session: xxx\n&lt;BR /&gt;
AuthServer: xxx\n&lt;BR /&gt;
AppServer: xxx\n&lt;BR /&gt;
ProxyServer: xxx\n&lt;BR /&gt;
AgentAddress: xxx\n&lt;BR /&gt;
Interface: xxx\n&lt;BR /&gt;
MoreInfo: xxx\n&lt;BR /&gt;
\n&lt;/P&gt;

&lt;P&gt;Not every Entry has all of those fields.&lt;/P&gt;

&lt;P&gt;On the Splunk server they look like this:&lt;BR /&gt;
When:&lt;BR /&gt;
Measure:&lt;BR /&gt;
Actor:&lt;BR /&gt;
Impersonator:&lt;BR /&gt;
ClientAddress:&lt;BR /&gt;
Session:&lt;BR /&gt;
AuthServer:&lt;BR /&gt;
AppServer:&lt;BR /&gt;
ProxyServer:&lt;BR /&gt;
AgentAddress:&lt;BR /&gt;
Interface:&lt;BR /&gt;
MoreInfo:&lt;BR /&gt;
Event:&lt;BR /&gt;
TargetObject:&lt;BR /&gt;
SecondaryTarget:&lt;BR /&gt;
Outcome:&lt;/P&gt;

&lt;P&gt;So it seems splunk isn't exactly parsing the data correctly.  &lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 16:48:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359376#M106249</guid>
      <dc:creator>clesto</dc:creator>
      <dc:date>2017-03-17T16:48:08Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359377#M106250</link>
      <description>&lt;P&gt;Well I finally got some regex that worked.  Not sure why it worked, but all I did was add a $ at the end and it's working correctly now.&lt;/P&gt;

&lt;P&gt;In fact, I had LOADS of problems with this input.  Trying to troubleshoot this lead me to trying to figure out why the event was actually multiple events in one event, which when I figured that out I then had to figure out why the times between the splunk event and the log event were off.  Anyways, after troubleshooting all day I was able to fix all of the issues.&lt;/P&gt;

&lt;P&gt;The final regex that worked was&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Outcome:\s(?P&amp;lt;snare_outcome&amp;gt;.*?)$
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If someone who is much more regex savvy than myself could possibly explain why this worked that would be nice.   And I'm far from regex savvy.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Mar 2017 23:33:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359377#M106250</guid>
      <dc:creator>clesto</dc:creator>
      <dc:date>2017-03-21T23:33:35Z</dc:date>
    </item>
    <item>
      <title>Re: Why does my regular expression provide inconsistent results for my field extraction?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359378#M106251</link>
      <description>&lt;P&gt;@clesto - Did your answer provide a working solution to your question? If yes and you would like to close out your post, don't forget to click "Accept". But if you'd like to keep it open for possibilities of other answers/comments, you don't have to take action on it yet.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Mar 2017 01:37:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-does-my-regular-expression-provide-inconsistent-results-for/m-p/359378#M106251</guid>
      <dc:creator>aaraneta_splunk</dc:creator>
      <dc:date>2017-03-22T01:37:31Z</dc:date>
    </item>
  </channel>
</rss>

