<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: inconsistent # of events parsed - /w custom SourceType &amp; same source file in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26288#M4324</link>
    <description>&lt;P&gt;OK &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; I've updated. Thanks. What about my main problem? any ideas?&lt;/P&gt;</description>
    <pubDate>Tue, 07 Aug 2012 09:37:43 GMT</pubDate>
    <dc:creator>AccentureQBETA</dc:creator>
    <dc:date>2012-08-07T09:37:43Z</dc:date>
    <item>
      <title>inconsistent # of events parsed - /w custom SourceType &amp; same source file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26284#M4320</link>
      <description>&lt;P&gt;Using Splunk version 4.3.3, build 128297&lt;BR /&gt;
Using Windows Server 2008 Enterprise version 6 (Build 6002: Service Pack 2) - a Virtual Machine.&lt;/P&gt;

&lt;P&gt;Why do I see a different number of events indexed (Event Count) via &lt;EM&gt;/en-GB/manager/launcher/data/indexes&lt;/EM&gt; using the UI. When I'm adding data to Splunk from a static file, using the same file and a new index (created using the defualt settings) each time...&lt;/P&gt;

&lt;P&gt;So far I have gotten these counts:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;13,281&lt;/LI&gt;
&lt;LI&gt;17,469&lt;/LI&gt;
&lt;LI&gt;16,273&lt;/LI&gt;
&lt;LI&gt;20,202&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;The source file which is an Apache Tomcat Server Log, is 3,637,248 bytes on disk, with 21319 Lines. I've created a custom Source Type for it:&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;My props.conf:&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[Apache-TomCat]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
REPORT-Apache-TomCat = Apache-TomCat
TRANSFORMS-comment = comment
LINE_BREAKER = ([\r\n]+)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;STRONG&gt;My transforms.conf:&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[comment]
REGEX = ^#
DEST_KEY = queue
FORMAT = nullQueue

[Apache-TomCat]
FIELDS="date", "time", "c-ip", "x-H(remoteUser)", "cs-method", "cs-uri", "sc-status", "time-taken", "x-H(requestedSessionId)", "x-P(inFrame)", "x-P(eventSource)", "x-P(eventParam)", "x-P(eventShift)", "x-P(rcounter)", "x-P(scrollPositions)", "x-P(objFocusId)", "x-P(__navigator_index)", "x-R(username)", "x-S(int_user_id)
DELIMS = " "
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I'm adding data to splunk via the Splunk UI, navigating from &lt;EM&gt;Manager &amp;gt; Data inputs &amp;gt; Add data &amp;gt; Files and directories &amp;gt; Add new&lt;/EM&gt; Selecting &lt;EM&gt;Upload and index a file&lt;/EM&gt; Browsing for the file (D:\NTPA1111_log_2012-07-30 - sample.txt) and adding the below for &lt;EM&gt;More Settings:&lt;/EM&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;Set Host: constant value&lt;/LI&gt;
&lt;LI&gt;Host field value: NTXA1528&lt;/LI&gt;
&lt;LI&gt;Set the source type: From List: Apache-TomCat&lt;/LI&gt;
&lt;LI&gt;Set the destination index: test1&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;For testing, I created 6 more indexes and tried adding the file two more times with the current settings specified above:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;18921&lt;/LI&gt;
&lt;LI&gt;15590&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I removed &lt;EM&gt;LINE_BREAKER = ([\r\n]+)&lt;/EM&gt; from the local props.conf file and tried 2 more times:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;17,729&lt;/LI&gt;
&lt;LI&gt;18,803&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I removed the &lt;EM&gt;[comment]&lt;/EM&gt; Stnza from the local transforms.conf file, removed &lt;EM&gt;TRANSFORMS-comment = comment&lt;/EM&gt; from the local props.config and ran it 2 more times:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;15,244&lt;/LI&gt;
&lt;LI&gt;16,465&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Still my results are inconsistant &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;I've just reinstalled Splunk, created the local transforms.conf and props.conf (without the comment stanza and line_break line...) files, restarted splunk and then tried to index the file 3 more times:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;21321 &lt;/LI&gt;
&lt;LI&gt;19,063&lt;/LI&gt;
&lt;LI&gt;18995&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I'm really surpried this is happening. any help/ideas would be greatful.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Example of the Log:&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;#Fields: date time c-ip x-H(remoteUser) cs-method cs-uri sc-status time-taken x-H(requestedSessionId) x-P(inFrame) x-P(eventSource) x-P(eventParam) x-P(eventShift) x-P(rcounter) x-P(scrollPositions) x-P(objFocusId) x-P(__navigator_index) x-R(username) x-S(int_user_id)
#Version: 2.0
#Software: Apache Tomcat/6.0.26
2012-07-30 07:00:01 255.255.255.255 - POST /Name/APP.do?ts=20383926 200 0.041 'F039AE0E56089412190ABAE26496B80E' - - - - - - - '0' - 'BBBBBB'
2012-07-30 07:00:01 255.255.255.255 - GET /Name/resources/Folder/images/image.gif 200 0.000 'F039AE0E56089412190ABEE26496B80E' - - - - - - - - - 'BBBBBB'
2012-07-30 07:00:05 255.255.255.255 - GET /Name/?internal=Y 401 0.001 - - - - - - - - - - -
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 28 Sep 2020 12:12:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26284#M4320</guid>
      <dc:creator>AccentureQBETA</dc:creator>
      <dc:date>2020-09-28T12:12:06Z</dc:date>
    </item>
    <item>
      <title>Re: inconsistent # of events parsed - /w custom SourceType &amp; same source file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26285#M4321</link>
      <description>&lt;P&gt;Is this a static file? Are more events being added to the file? What is the "linecount" of the file according to other tools?&lt;/P&gt;

&lt;P&gt;Second, although you didn't ask, some of your field names are invalid in the Apache-TomCat stanza of the transforms.conf. Field names may contain only alphabetic characters, numbers and underscore; they must begin with an alphabetic character.&lt;/P&gt;

&lt;P&gt;Finally, your comment regex should be&lt;BR /&gt;
REGEX = ^#&lt;/P&gt;

&lt;P&gt;You were not requiring that the line begin with a #!&lt;/P&gt;</description>
      <pubDate>Mon, 06 Aug 2012 22:38:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26285#M4321</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2012-08-06T22:38:33Z</dc:date>
    </item>
    <item>
      <title>Re: inconsistent # of events parsed - /w custom SourceType &amp; same source file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26286#M4322</link>
      <description>&lt;P&gt;This is a statiuc file. &lt;/P&gt;

&lt;P&gt;Thanks for pointing out the field name problem, I've changed them now. After re-reading the Transforms.conf doc, I realise CLEAN_KEYS which defaults to true, implicitly solved my problem with the field names. Probabaly has a performace impact..&lt;/P&gt;

&lt;P&gt;Regarding the Regex, I just checked your suggested syntax vs what I was using, in &lt;A href="http://gskinner.com/RegExr/"&gt;http://gskinner.com/RegExr/&lt;/A&gt; and your didn't highlight any comments begining with #&lt;/P&gt;

&lt;P&gt;Splunk Team seem to suggest this tool too: &lt;A href="http://wiki.splunk.com/Community:RegexTestingTools"&gt;http://wiki.splunk.com/Community:RegexTestingTools&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;How sure are you my regex is incorrect?&lt;/P&gt;</description>
      <pubDate>Tue, 07 Aug 2012 09:13:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26286#M4322</guid>
      <dc:creator>AccentureQBETA</dc:creator>
      <dc:date>2012-08-07T09:13:48Z</dc:date>
    </item>
    <item>
      <title>Re: inconsistent # of events parsed - /w custom SourceType &amp; same source file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26287#M4323</link>
      <description>&lt;P&gt;The circumflex is required to anchor the regular expression at the beginning of the line. Your regex will match comments - but it will also match other lines that have a #. If you are sure that no other events will have a # anywhere in the event, no worries.&lt;/P&gt;

&lt;P&gt;I didn't think that # was a reserved character, but perhaps it is in some regex flavors. So maybe&lt;/P&gt;

&lt;P&gt;REGEX = ^\#&lt;/P&gt;

&lt;P&gt;is better and will work with RegExr&lt;/P&gt;</description>
      <pubDate>Tue, 07 Aug 2012 09:19:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26287#M4323</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2012-08-07T09:19:20Z</dc:date>
    </item>
    <item>
      <title>Re: inconsistent # of events parsed - /w custom SourceType &amp; same source file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26288#M4324</link>
      <description>&lt;P&gt;OK &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; I've updated. Thanks. What about my main problem? any ideas?&lt;/P&gt;</description>
      <pubDate>Tue, 07 Aug 2012 09:37:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26288#M4324</guid>
      <dc:creator>AccentureQBETA</dc:creator>
      <dc:date>2012-08-07T09:37:43Z</dc:date>
    </item>
    <item>
      <title>Re: inconsistent # of events parsed - /w custom SourceType &amp; same source file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26289#M4325</link>
      <description>&lt;P&gt;How are you comparing the sizes? By looking at the Manager-&amp;gt;Indexes page, or by running this command&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=* sourcetype=Apache-TomCat | stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And do you get the same answer both ways?&lt;/P&gt;

&lt;P&gt;Did you consider using one of the built-in sourcetypes for Apache data - access_combined or access_combined_wcookie?&lt;/P&gt;</description>
      <pubDate>Thu, 09 Aug 2012 22:46:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26289#M4325</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2012-08-09T22:46:46Z</dc:date>
    </item>
    <item>
      <title>Re: inconsistent # of events parsed - /w custom SourceType &amp; same source file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26290#M4326</link>
      <description>&lt;P&gt;Hi Iguinn, I was only previously looking at the Manager-&amp;gt;Indexes page.&lt;/P&gt;

&lt;P&gt;Now when I run this: index=cms_test_1 | stats count by index&lt;/P&gt;

&lt;P&gt;I get this&lt;/P&gt;

&lt;P&gt;index  count&lt;BR /&gt;&lt;BR /&gt;
1 cms_test_1 20442 &lt;/P&gt;

&lt;P&gt;Notepad without wordwrap shows I should get: 20445 (so minues 3 for comments and woohoo!)&lt;/P&gt;

&lt;P&gt;I tried it on 3 more files and it appears to not be working now...&lt;/P&gt;

&lt;P&gt;Splunk Indexed:&lt;/P&gt;

&lt;P&gt;File1 = 20442&lt;BR /&gt;
File2 = 24350&lt;BR /&gt;
File3 = 25425&lt;/P&gt;

&lt;P&gt;Notepad shows:&lt;/P&gt;

&lt;P&gt;file1 = 20442&lt;BR /&gt;
file2 = 25467&lt;BR /&gt;
file3 = 26540&lt;/P&gt;

&lt;P&gt;Running this index=cms_test_1 | stats count by index  shows the total of 72449 all in 1 result.. so the line break appears to be working.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 12:15:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26290#M4326</guid>
      <dc:creator>AccentureQBETA</dc:creator>
      <dc:date>2020-09-28T12:15:01Z</dc:date>
    </item>
    <item>
      <title>Re: inconsistent # of events parsed - /w custom SourceType &amp; same source file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26291#M4327</link>
      <description>&lt;P&gt;In terms of considering Access_combined, Yes, but it doesn't capture the fields I would like. I'm unsure how that sourcetype will turn my logs into events either and if we will be able to add any index/search time field extraction with this soucetype. I'll try using that today and see if it is any better.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Aug 2012 09:13:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/inconsistent-of-events-parsed-w-custom-SourceType-same-source/m-p/26291#M4327</guid>
      <dc:creator>AccentureQBETA</dc:creator>
      <dc:date>2012-08-14T09:13:40Z</dc:date>
    </item>
  </channel>
</rss>

