<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Indexing Urchin data, specifying timestamps, line breaks in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76525#M15630</link>
    <description>&lt;P&gt;He does &lt;EM&gt;not&lt;/EM&gt; want multiple events. He only wants one single event for the whole file, and to just use the timestamp at the top and ignore the internal ones.&lt;/P&gt;</description>
    <pubDate>Sun, 18 Sep 2011 17:43:28 GMT</pubDate>
    <dc:creator>gkanapathy</dc:creator>
    <dc:date>2011-09-18T17:43:28Z</dc:date>
    <item>
      <title>Indexing Urchin data, specifying timestamps, line breaks</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76522#M15627</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I am trying to index some processing data from Urchin and having trouble with timestamp recognition and line breaking. I would be happy for each file to be treated as a single event with a timestamp based on the second line of each file, or the file date (local, preferred) or filename (UTC - would need to be converted to local), so that is the direction I've been heading. However, I'm winding up with multiple events. &lt;/P&gt;

&lt;P&gt;sample file - /opt/urchin6/data/history/%28NONE%29/splunktest/20110915_134600.log (file date 2011-09-15 09:47 )&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;------------------------------------------------------
Urchin 6.5.00 (linux2.6_kernel) starting: 20110915 09:46:34
------------------------------------------------------
Processing profile: Winter (on urchin1 6404)

[09:46:37] Logfile: /opt/urchin/remote-logs/web0/urchin_log-20110914
   data lines: 904129 (100%)
   data hits:  342
   data proc:  391.39 MB in 00:00:14  (27.956 MB/sec)
   data range: 2011-09-14 03:33 (-0400) - 2011-09-14 23:39 (-0400)

[09:46:51] Post processing data for 201109
   sessions: 623 (100%)

[09:46:52] Backing up database files for 201109: /opt/urchin6/data/reports/%28NONE%29/Winter/201109-backupv6-20110915051652.zip

[09:46:52] Removing outdated backup for 201109:  /opt/urchin6/data/reports/%28NONE%29/Winter/201109-backupv6-20110913051816.zip

------------------------------------------------------
Urchin 6.5.00 (linux2.6_kernel) finishing: 20110915 09:46:52
------------------------------------------------------
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;inputs.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///opt/urchin6/data/history]
disabled=false
sourcetype = urchin_history
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[urchin_history]
LINE_BREAKER = (?!)
SHOULD_LINEMERGE = true
TRUNCATE = 0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This gets indexed as five events with times at 9/14 9:46:51 (1 event), 9/15 9:46:34 (1 event), 9/15 9:46:52 (2), and 9/15 9:47:10 (1 - index time, I think.) When put together with a "| transaction source", the events are out of sequence.&lt;/P&gt;

&lt;P&gt;Thanks in advance,&lt;BR /&gt;
Andy&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2011 15:45:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76522#M15627</guid>
      <dc:creator>andyspusm</dc:creator>
      <dc:date>2011-09-15T15:45:16Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing Urchin data, specifying timestamps, line breaks</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76523#M15628</link>
      <description>&lt;P&gt;You could probably set the LINE_BREAKER to recognize the timestamp.  I believe this would do it:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;LINE_BREAKER = ^\[\d\d:\d\d:\d\d\]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Alternatively, you can tell Splunk how far to look into the event for the timestamp as well as the timestamp format:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;MAX_TIMESTAMP_LOOKAHEAD = 12
TIME_FORMAT =[%H:%M:%S]
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 16 Sep 2011 15:35:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76523#M15628</guid>
      <dc:creator>Simeon</dc:creator>
      <dc:date>2011-09-16T15:35:41Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing Urchin data, specifying timestamps, line breaks</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76524#M15629</link>
      <description>&lt;P&gt;You're almost there. I suspect that your sourcetype may not be getting applied. Put this props.conf on &lt;EM&gt;all&lt;/EM&gt; servers (forwarders and indexers, and search head for good measure) and you'll be fine. (If you want to know more: &lt;A href="http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F"&gt;http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F&lt;/A&gt; )&lt;/P&gt;

&lt;P&gt;The right set of settings should be:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[urchin_history]
SHOULD_LINEMERGE=false
LINE_BREAKER = (?!)
TIME_PREFIX = starting:
TIME_FORMAT = %Y%m%d %H:%M:%S
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 18 Sep 2011 17:42:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76524#M15629</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2011-09-18T17:42:11Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing Urchin data, specifying timestamps, line breaks</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76525#M15630</link>
      <description>&lt;P&gt;He does &lt;EM&gt;not&lt;/EM&gt; want multiple events. He only wants one single event for the whole file, and to just use the timestamp at the top and ignore the internal ones.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Sep 2011 17:43:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76525#M15630</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2011-09-18T17:43:28Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing Urchin data, specifying timestamps, line breaks</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76526#M15631</link>
      <description>&lt;P&gt;Thanks gkanapathy, but no dice so far.&lt;/P&gt;

&lt;P&gt;I think my sourcetype is getting applied because I see it in my indexed events. My props.conf is in /opt/splunk/etc/deployment-apps/urchin/default on the search head/indexer (all one box) and in /opt/splunkforwarder/etc/apps/urchin/default on the forwarder. Otherwise, the stanza is as you defined it.&lt;/P&gt;

&lt;P&gt;I'm still getting four events - unfortunately I don't have the space to post them all here. With reference to the file above (excluding whitespace lines), the events start at lines 1 (-----), 2 (Urchin start), 9 (data range), and 15 (Urchin finish).&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2011 21:03:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76526#M15631</guid>
      <dc:creator>andyspusm</dc:creator>
      <dc:date>2011-09-20T21:03:46Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing Urchin data, specifying timestamps, line breaks</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76527#M15632</link>
      <description>&lt;P&gt;Could be that gkanapathy was right and I just had the .conf files in the wrong places. Anyway, I got the following from Michael Wegener at Splunk support and this solution is working well. With his permission, I'm sharing here:&lt;/P&gt;

&lt;P&gt;On the indexer etc/system/local/inputs.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///var/log/urchin]
disabled = 0
followTail = 0
index = test
sourcetype = urchin_history
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;On the forwarder etc/system/local/props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[urchin_history]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = -*\rUrchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+starting:\s+\d+\s+\d+:\d+:\d+
MUST_BREAK_AFTER = (Urchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+finishing:\s+\d+\s+\d+:\d+:\d+\r-*|DETAIL:\s+:.*)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Note that in the process of troubleshooting, I changed the monitor location from that mentioned in the original question.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2011 16:37:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-Urchin-data-specifying-timestamps-line-breaks/m-p/76527#M15632</guid>
      <dc:creator>andyspusm</dc:creator>
      <dc:date>2011-10-04T16:37:48Z</dc:date>
    </item>
  </channel>
</rss>

