Getting Data In

Avoid duplicate data and ignore # fields

kmattern
Builder

I have customer systems that log data to IIS on file transfers. IIS has a timeout of 20 minutes. When it times out it immediately restarts but throws in a new set of headers. Also the date/time stamp on the log changes and Splunk assumes that it is a new file.

How can I avoid the duplication of data when Splunk attempts to re-index the log or how do I get Splunk to only consume the new data? And how do I ignore the headers scattered throughout the log file?

0 Karma

wsnyder2
Path Finder

We use the following line in the sourcetype stanza for iis in the props.conf file.

SEDCMD-THROWAWAY-COMMENTS=s/^#.+[\r\n]+#.+[\r\n]+#.+[\r\n]+#.*[\r\n]//g

0 Karma

ogdin
Splunk Employee
Splunk Employee

Use INDEXED_EXTRACTIONS=W3C in Splunk 6. We will honor the header found at the top of the file and ignore any line beginning with a # after that. Plus, we do the field extraction automatically from the header so you don't have to mess with props and transforms.

http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime

0 Karma

lukejadamec
Super Champion

There are two problems here. First, you can remove the extra header lines with additions to inputs.conf, props.conf, and transforms.conf.

Note: I’m using a new sourcetype, so I need a stanza in inputs.conf. If you want to use the existing sourcetype in inputs.conf, then you will need to specify that sourcetype in props.conf (i.e. substitute my winIIS with the sourcetype found in your inputs.conf).

inputs.conf

[monitor://c:\inetpub\logs\Logfiles\W3SVC1\*.log]
sourcetype = winIIS
queue = parsingQueue
index = default
disabled = false

props.conf

[winIIS]
SHOULD_LINEMERGE = false
CHECK_FOR_HEADER = false
REPORT-fields = windows_iis_header
TRANSFORMS-headers = remove_headers

transforms.conf

[remove_headers]
REGEX = ^#.*
DEST_KEY = queue
FORMAT = nullQueue

[winIIS]
FIELDS = “date”,”time”,”s_ip”,….. you need to complete the list with your log header configuration.
DELIMS = “ ”

Here is another example of the same:
http://answers.splunk.com/answers/24986/iis-log-fields-not-parsing

As for the duplication problem, I’ve not seen that. Having the timestamp of the file update is normal, and should not cause a re-read of the file. Splunk hashes the beginning of the file, so if that does not change then it should not be re-read. I’m guessing you have a setting in inputs.conf that is causing it. Can you post your inputs.conf?

Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...