<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Use positive lookahead in regex when applying field transformation at index time in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186010#M53591</link>
    <description>&lt;P&gt;I know you said you wanted to sed the data at index time, but let me try and dissuade you.&lt;/P&gt;

&lt;P&gt;Once you lose that ID, you can't get it back. And you may want it later.&lt;/P&gt;

&lt;P&gt;This will extract the data you want at search time.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;       |  rex field=_raw "(GET|HEAD|POST|PUT|OPTIONS|CONNECT)\s(?.+)/\d+ HTTP"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 24 Oct 2014 22:51:26 GMT</pubDate>
    <dc:creator>bshuler_splunk</dc:creator>
    <dc:date>2014-10-24T22:51:26Z</dc:date>
    <item>
      <title>Use positive lookahead in regex when applying field transformation at index time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186009#M53590</link>
      <description>&lt;P&gt;I am trying to normalize the URLs from the access log file in tomcat in order to analyze the evolution of the requests performance&lt;/P&gt;

&lt;P&gt;Example URLs:&lt;BR /&gt;
&lt;PRE&gt;&lt;BR /&gt;
192.33.20.22 2014-10-15 13:47:16,477 "POST /test/rest/1.0/payments/reimbursement/164653 HTTP/1.1" 400 240 773 10282&lt;BR /&gt;
192.33.20.22 2014-10-15 13:46:27,062 "POST /test/rest/1.0/payments/reimbursement/164653 HTTP/1.1" 400 241 2068 10282&lt;BR /&gt;
192.33.20.22 2014-10-23 12:45:26,197 "GET /test/rest/1.0/applications/10113 HTTP/1.1" 200 507 97 110860&lt;BR /&gt;
192.33.20.22 2014-10-23 11:54:05,302 "GET /test/rest/1.0/applications/10114 HTTP/1.1" 200 507 92 110860&lt;BR /&gt;
192.33.20.22 2014-10-23 11:53:54,313 "GET /test/rest/1.0/applications/10115/generateKey HTTP/1.1" 200 509 1236 110860&lt;BR /&gt;
192.33.20.22 2014-10-23 11:53:54,313 "GET /test/rest/1.0/applications/10116/generateKey HTTP/1.1" 200 509 1236 110860&lt;BR /&gt;
&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;There are many different types of urls, these are just a couple of examples so it must be generic&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;I want to replace all occurrences of an id in the url by a common element (like "byId") in order to analyze the performance of the urls. &lt;/P&gt;

&lt;P&gt;What I have done so far is :&lt;/P&gt;

&lt;P&gt;.../system/props.conf:&lt;BR /&gt;
&lt;PRE&gt;&lt;BR /&gt;
[test-access-log]&lt;BR /&gt;
TRANSFORMS-fix-urls = remove-trailing-id&lt;BR /&gt;
&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;.../system/transforms.conf&lt;/P&gt;

&lt;P&gt;[remove-trailing-id]&lt;/P&gt;

&lt;P&gt;REGEX = ^(.*)(GET|HEAD|POST|PUT|OPTIONS|CONNECT)(\s\/test)((\/.*?)+)\/(?=[0-9]{1,}\s)([0-9]{1,})(\sHTTP.*)$&lt;/P&gt;

&lt;P&gt;FORMAT = $1$2$3$4/byId$6&lt;/P&gt;

&lt;P&gt;DEST_KEY = _raw&lt;/P&gt;

&lt;P&gt;I am using a regex positive lookahead in order to know when there is an id coming &lt;STRONG&gt;(?=[0-9]{1,}\s)&lt;/STRONG&gt;. As you can see, the fifth group ($5) should be the id in each case (example: /12345). I have tested my regular expression and it works on a regular expression tester. However, I am uable to make it work with Splunk.&lt;/P&gt;

&lt;P&gt;Is there something that I am missing or is there a better way of accomplishing such a task.&lt;/P&gt;

&lt;P&gt;This is what I want the urls to look like:&lt;BR /&gt;
&lt;PRE&gt;&lt;BR /&gt;
192.33.20.22 2014-10-15 13:47:16,477 "POST /test/rest/1.0/payments/reimbursement/byId HTTP/1.1" 400 240 773 10282&lt;BR /&gt;
192.33.20.22 2014-10-23 12:45:26,197 "GET /test/rest/1.0/applications/byId HTTP/1.1" 200 507 97 110860&lt;BR /&gt;
192.33.20.22 2014-10-23 12:45:26,197 "GET /test/rest/1.0/applications/generateConfigById HTTP/1.1" 200 507 97 110860&lt;BR /&gt;
&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;Any help would be very much appreciated.&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 24 Oct 2014 21:39:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186009#M53590</guid>
      <dc:creator>splunkmasterfle</dc:creator>
      <dc:date>2014-10-24T21:39:57Z</dc:date>
    </item>
    <item>
      <title>Re: Use positive lookahead in regex when applying field transformation at index time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186010#M53591</link>
      <description>&lt;P&gt;I know you said you wanted to sed the data at index time, but let me try and dissuade you.&lt;/P&gt;

&lt;P&gt;Once you lose that ID, you can't get it back. And you may want it later.&lt;/P&gt;

&lt;P&gt;This will extract the data you want at search time.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;       |  rex field=_raw "(GET|HEAD|POST|PUT|OPTIONS|CONNECT)\s(?.+)/\d+ HTTP"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 24 Oct 2014 22:51:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186010#M53591</guid>
      <dc:creator>bshuler_splunk</dc:creator>
      <dc:date>2014-10-24T22:51:26Z</dc:date>
    </item>
    <item>
      <title>Re: Use positive lookahead in regex when applying field transformation at index time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186011#M53592</link>
      <description>&lt;P&gt;This answer doesn't help me. I would be willing to use a search time transform if it solved my issue. Maybe I wasn't clear enough in my question. I need to regroup all urls that have an id. Simply removing the id does not cut it, I need to add something to make the group unique. In each case, the url already exists without the ID and signifies a "get all" (they are two different methods). This means that my performance would be skewed and the two different method invocations would be regrouped as one which is incorrect.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Oct 2014 19:01:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186011#M53592</guid>
      <dc:creator>splunkmasterfle</dc:creator>
      <dc:date>2014-10-27T19:01:23Z</dc:date>
    </item>
    <item>
      <title>Re: Use positive lookahead in regex when applying field transformation at index time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186012#M53593</link>
      <description>&lt;P&gt;This modification at search time seems to give what you are looking for.&lt;/P&gt;

&lt;P&gt;If not, please give before and after examples of what you are looking for.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;  | rex field=_raw mode=sed "s/(\w+\/)\d+( |\/)/\1byId\2/g"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 31 Oct 2014 02:20:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186012#M53593</guid>
      <dc:creator>bshuler_splunk</dc:creator>
      <dc:date>2014-10-31T02:20:49Z</dc:date>
    </item>
    <item>
      <title>Re: Use positive lookahead in regex when applying field transformation at index time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186013#M53594</link>
      <description>&lt;P&gt;and that rex doesn't actually extract anything.  splunkguy, what are you trying to do?  It seems complicated and there seems like there is a much simpler solution if the entire picture can be seen.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jul 2015 21:12:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186013#M53594</guid>
      <dc:creator>landen99</dc:creator>
      <dc:date>2015-07-17T21:12:48Z</dc:date>
    </item>
    <item>
      <title>Re: Use positive lookahead in regex when applying field transformation at index time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186014#M53595</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;| rex mode=sed "s.\/(\d+)[\s\/].byId.g"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;SEDCMD in props.conf on the indexer does the same thing to the indexed data.  Transforms is not the only way to change data before indexing.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;SEDCMD-byId = s.\/(\d+)[\s\/].byId.g
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If you want the digits merely deleted then remove byId.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jul 2015 21:25:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Use-positive-lookahead-in-regex-when-applying-field/m-p/186014#M53595</guid>
      <dc:creator>landen99</dc:creator>
      <dc:date>2015-07-17T21:25:01Z</dc:date>
    </item>
  </channel>
</rss>

