inconsistent # of events parsed - /w custom Source...

AccentureQBETA · ‎08-06-2012

Using Splunk version 4.3.3, build 128297
Using Windows Server 2008 Enterprise version 6 (Build 6002: Service Pack 2) - a Virtual Machine.

Why do I see a different number of events indexed (Event Count) via /en-GB/manager/launcher/data/indexes using the UI. When I'm adding data to Splunk from a static file, using the same file and a new index (created using the defualt settings) each time...

So far I have gotten these counts:

13,281
17,469
16,273
20,202

The source file which is an Apache Tomcat Server Log, is 3,637,248 bytes on disk, with 21319 Lines. I've created a custom Source Type for it:

My props.conf:

[Apache-TomCat]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
REPORT-Apache-TomCat = Apache-TomCat
TRANSFORMS-comment = comment
LINE_BREAKER = ([\r\n]+)

My transforms.conf:

[comment]
REGEX = ^#
DEST_KEY = queue
FORMAT = nullQueue

[Apache-TomCat]
FIELDS="date", "time", "c-ip", "x-H(remoteUser)", "cs-method", "cs-uri", "sc-status", "time-taken", "x-H(requestedSessionId)", "x-P(inFrame)", "x-P(eventSource)", "x-P(eventParam)", "x-P(eventShift)", "x-P(rcounter)", "x-P(scrollPositions)", "x-P(objFocusId)", "x-P(__navigator_index)", "x-R(username)", "x-S(int_user_id)
DELIMS = " "

I'm adding data to splunk via the Splunk UI, navigating from Manager > Data inputs > Add data > Files and directories > Add new Selecting Upload and index a file Browsing for the file (D:\NTPA1111_log_2012-07-30 - sample.txt) and adding the below for More Settings:

Set Host: constant value
Host field value: NTXA1528
Set the source type: From List: Apache-TomCat
Set the destination index: test1

For testing, I created 6 more indexes and tried adding the file two more times with the current settings specified above:

18921
15590

I removed LINE_BREAKER = ([\r\n]+) from the local props.conf file and tried 2 more times:

17,729
18,803

I removed the [comment] Stnza from the local transforms.conf file, removed TRANSFORMS-comment = comment from the local props.config and ran it 2 more times:

15,244
16,465

Still my results are inconsistant 😞

I've just reinstalled Splunk, created the local transforms.conf and props.conf (without the comment stanza and line_break line...) files, restarted splunk and then tried to index the file 3 more times:

21321
19,063
18995

I'm really surpried this is happening. any help/ideas would be greatful.

Example of the Log:

#Fields: date time c-ip x-H(remoteUser) cs-method cs-uri sc-status time-taken x-H(requestedSessionId) x-P(inFrame) x-P(eventSource) x-P(eventParam) x-P(eventShift) x-P(rcounter) x-P(scrollPositions) x-P(objFocusId) x-P(__navigator_index) x-R(username) x-S(int_user_id)
#Version: 2.0
#Software: Apache Tomcat/6.0.26
2012-07-30 07:00:01 255.255.255.255 - POST /Name/APP.do?ts=20383926 200 0.041 'F039AE0E56089412190ABAE26496B80E' - - - - - - - '0' - 'BBBBBB'
2012-07-30 07:00:01 255.255.255.255 - GET /Name/resources/Folder/images/image.gif 200 0.000 'F039AE0E56089412190ABEE26496B80E' - - - - - - - - - 'BBBBBB'
2012-07-30 07:00:05 255.255.255.255 - GET /Name/?internal=Y 401 0.001 - - - - - - - - - - -

lguinn2 · ‎08-09-2012

How are you comparing the sizes? By looking at the Manager->Indexes page, or by running this command

index=* sourcetype=Apache-TomCat | stats count by index

And do you get the same answer both ways?

Did you consider using one of the built-in sourcetypes for Apache data - access_combined or access_combined_wcookie?

AccentureQBETA · ‎08-14-2012

In terms of considering Access_combined, Yes, but it doesn't capture the fields I would like. I'm unsure how that sourcetype will turn my logs into events either and if we will be able to add any index/search time field extraction with this soucetype. I'll try using that today and see if it is any better.

AccentureQBETA · ‎08-13-2012

Hi Iguinn, I was only previously looking at the Manager->Indexes page.

Now when I run this: index=cms_test_1 | stats count by index

I get this

index count

1 cms_test_1 20442

Notepad without wordwrap shows I should get: 20445 (so minues 3 for comments and woohoo!)

I tried it on 3 more files and it appears to not be working now...

Splunk Indexed:

File1 = 20442
File2 = 24350
File3 = 25425

Notepad shows:

file1 = 20442
file2 = 25467
file3 = 26540

Running this index=cms_test_1 | stats count by index shows the total of 72449 all in 1 result.. so the line break appears to be working.

AccentureQBETA · ‎08-07-2012

OK 🙂 I've updated. Thanks. What about my main problem? any ideas?

lguinn2 · ‎08-07-2012

The circumflex is required to anchor the regular expression at the beginning of the line. Your regex will match comments - but it will also match other lines that have a #. If you are sure that no other events will have a # anywhere in the event, no worries.

I didn't think that # was a reserved character, but perhaps it is in some regex flavors. So maybe

REGEX = ^\#

is better and will work with RegExr

AccentureQBETA · ‎08-07-2012

This is a statiuc file.

Thanks for pointing out the field name problem, I've changed them now. After re-reading the Transforms.conf doc, I realise CLEAN_KEYS which defaults to true, implicitly solved my problem with the field names. Probabaly has a performace impact..

Regarding the Regex, I just checked your suggested syntax vs what I was using, in http://gskinner.com/RegExr/ and your didn't highlight any comments begining with #

Splunk Team seem to suggest this tool too: http://wiki.splunk.com/Community:RegexTestingTools

How sure are you my regex is incorrect?

lguinn2 · ‎08-06-2012

Is this a static file? Are more events being added to the file? What is the "linecount" of the file according to other tools?

Second, although you didn't ask, some of your field names are invalid in the Apache-TomCat stanza of the transforms.conf. Field names may contain only alphabetic characters, numbers and underscore; they must begin with an alphabetic character.

Finally, your comment regex should be
REGEX = ^#

You were not requiring that the line begin with a #!

inconsistent # of events parsed - /w custom SourceType & same source file

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation