<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do you filter out dates from being segmented in segmenters.conf? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449414#M78055</link>
    <description>&lt;P&gt;Hi all, &lt;/P&gt;

&lt;P&gt;Splunk offers the  possibility to customize the way we want data to be segmented in the index files with a regex, like for this timestamp : &lt;/P&gt;

&lt;P&gt;segmenters.conf : &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[seg_rule]
FILTER=^\d\d\d\d-\d\d-\d\d\s*\d\d:\d\d:\d\d(.*)$
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This manipulation avoids timestamp (located at the beginning of the log) from being segmented, and the rest (.*) is captured. So we spare memory space, but we lose the capability to search for it without the _time field. &lt;/P&gt;

&lt;P&gt;My issue is the following : I want to do the same for every dates values in my data, and not only timestamps. But the Splunk documentation of segmenters.conf says that:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;"segmentation will only take place on&lt;BR /&gt;
the first group of the matching&lt;BR /&gt;
regex."&lt;BR /&gt;
So that we can't filter stuff that is located AT THE MIDDLE of the log, because for that, we need at least 2 matching groups. I tried it, and effectively, it only segments the part before the date matching and filters the rest. &lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Any idea please? &lt;/P&gt;</description>
    <pubDate>Fri, 08 Feb 2019 13:40:03 GMT</pubDate>
    <dc:creator>julienoud</dc:creator>
    <dc:date>2019-02-08T13:40:03Z</dc:date>
    <item>
      <title>How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449414#M78055</link>
      <description>&lt;P&gt;Hi all, &lt;/P&gt;

&lt;P&gt;Splunk offers the  possibility to customize the way we want data to be segmented in the index files with a regex, like for this timestamp : &lt;/P&gt;

&lt;P&gt;segmenters.conf : &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[seg_rule]
FILTER=^\d\d\d\d-\d\d-\d\d\s*\d\d:\d\d:\d\d(.*)$
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This manipulation avoids timestamp (located at the beginning of the log) from being segmented, and the rest (.*) is captured. So we spare memory space, but we lose the capability to search for it without the _time field. &lt;/P&gt;

&lt;P&gt;My issue is the following : I want to do the same for every dates values in my data, and not only timestamps. But the Splunk documentation of segmenters.conf says that:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;"segmentation will only take place on&lt;BR /&gt;
the first group of the matching&lt;BR /&gt;
regex."&lt;BR /&gt;
So that we can't filter stuff that is located AT THE MIDDLE of the log, because for that, we need at least 2 matching groups. I tried it, and effectively, it only segments the part before the date matching and filters the rest. &lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Any idea please? &lt;/P&gt;</description>
      <pubDate>Fri, 08 Feb 2019 13:40:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449414#M78055</guid>
      <dc:creator>julienoud</dc:creator>
      <dc:date>2019-02-08T13:40:03Z</dc:date>
    </item>
    <item>
      <title>Re: How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449415#M78056</link>
      <description>&lt;P&gt;You need @pmalcakdoj to chime in.  He is the only other guy that I know of crazy enough to actually modify &lt;CODE&gt;segementers.conf&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Feb 2019 20:18:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449415#M78056</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-02-13T20:18:33Z</dc:date>
    </item>
    <item>
      <title>Re: How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449416#M78057</link>
      <description>&lt;P&gt;I've never heard of a use case where memory space so tight that you would use this approach. As a potential alternative, have you considered using regular expressions in your props.conf for all of it?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Feb 2019 20:52:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449416#M78057</guid>
      <dc:creator>efavreau</dc:creator>
      <dc:date>2019-02-13T20:52:28Z</dc:date>
    </item>
    <item>
      <title>Re: How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449417#M78058</link>
      <description>&lt;P&gt;I ran into same limitation myself.&lt;BR /&gt;
The "single capture group" setting is set in stone.&lt;/P&gt;

&lt;P&gt;You've got 2 options (that I know of):&lt;BR /&gt;
- if possible, use syslog-ng to rewrite your data before it is ingested by splunk (rearrange your event so that all the "junk" data you don't want segmented is at the beginning of your event)&lt;BR /&gt;
- use index-time TRANSFORMS-foo to rewrite your _raw so that your "junk" data is discarded or placed at the beginning of your event&lt;/P&gt;

&lt;P&gt;I haven't tried the second option, but according to (&lt;A href="https://wiki.splunk.com/Community:HowIndexingWorks"&gt;https://wiki.splunk.com/Community:HowIndexingWorks&lt;/A&gt;), index-time segmentation should be happening in &lt;EM&gt;annotator processor&lt;/EM&gt;, which comes &lt;STRONG&gt;after&lt;/STRONG&gt; &lt;EM&gt;regexreplacement processor&lt;/EM&gt; , so it should work.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Feb 2019 21:02:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449417#M78058</guid>
      <dc:creator>pmalcakdoj</dc:creator>
      <dc:date>2019-02-13T21:02:50Z</dc:date>
    </item>
    <item>
      <title>Re: How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449418#M78059</link>
      <description>&lt;P&gt;You rock.  You are crazy but you rock.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Feb 2019 22:20:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449418#M78059</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-02-13T22:20:35Z</dc:date>
    </item>
    <item>
      <title>Re: How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449419#M78060</link>
      <description>&lt;P&gt;haha, glad I could help&lt;/P&gt;</description>
      <pubDate>Wed, 13 Feb 2019 22:37:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449419#M78060</guid>
      <dc:creator>pmalcakdoj</dc:creator>
      <dc:date>2019-02-13T22:37:07Z</dc:date>
    </item>
    <item>
      <title>Re: How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449420#M78061</link>
      <description>&lt;P&gt;Thanks for your ideas @pmalcakdoj, that's really relevant I think. &lt;BR /&gt;
Concerning the second point, I have to say that it's really smart but i don't see how to rewrite _raw by switching the positions in the log. &lt;BR /&gt;
Indeed i want to keep those data in the log, so i just want to put them at the beginning at index time, and then use my segmenters.conf modifications to avoid segmentation. But how to edit _raw : "xxxxx junkdata zzzz" to get _raw="junkdata xxxxx zzzzz" with props and transforms.conf?&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 14:12:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449420#M78061</guid>
      <dc:creator>julienoud</dc:creator>
      <dc:date>2019-02-20T14:12:53Z</dc:date>
    </item>
    <item>
      <title>Re: How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449421#M78062</link>
      <description>&lt;P&gt;you would need to capture all segments with capture groups and then reorder them in the FORMAT field with "$2 $0 $1 ..." backreferences &lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 16:22:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449421#M78062</guid>
      <dc:creator>pmalcakdoj</dc:creator>
      <dc:date>2019-02-20T16:22:07Z</dc:date>
    </item>
    <item>
      <title>Re: How do you filter out dates from being segmented in segmenters.conf?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449422#M78063</link>
      <description>&lt;P&gt;Thank you teacher, i think you rock indeed &lt;/P&gt;</description>
      <pubDate>Thu, 21 Feb 2019 08:45:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-you-filter-out-dates-from-being-segmented-in-segmenters/m-p/449422#M78063</guid>
      <dc:creator>julienoud</dc:creator>
      <dc:date>2019-02-21T08:45:16Z</dc:date>
    </item>
  </channel>
</rss>

