Getting Data In
Highlighted

Why is my file not being indexed?

Path Finder

I'm trying to on-board a new application and having issues from the get go.

Application is IBM IIB and outputs logs in to a number of sub directories beneath 1 parent logs directory.

Logs are single events per log file and a mix of XML and json formats.

I have started with the xml versions and one of the sub folders to confirm that my inputs were working correctly.

There are 3 files in the directory and all 3 show as being read, but only 2 made it to the index and show in search.

Trying to understand the cause of the 3rd not making it to the indexer.

Inputs:

[monitor://Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\*.log]
disabled = false
index = hprg_applog_nonprod
sourcetype = hprg:iib:xml

After setting this and seeing the issue - I ran “splunk list inputstatus” on the server where the files reside and I can see the below output regarding the files:

    Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\*.log
            type = directory

    Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\20180410_214256_061062_CustomerDominoConsumer_ActivateDeateClient_SOAPRequest.log
            file position = 313
            file size = 313
            parent = Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\*.log
            percent = 100.00
            type = finished reading

    Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\20180410_233834_470779_CustomerDominoConsumer_ActivateDeateClient_SOAPRequest.log
            file position = 313
            file size = 313
            parent = Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\*.log
            percent = 100.00
            type = finished reading

    Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\20180410_234012_541048_CustomerDominoConsumer_ActivateDeateClient_SOAPRequest.log
            file position = 313
            file size = 313
            parent = Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\*.log
            percent = 100.00
            type = finished reading

which to me suggests that Splunk has read and process all 3 files, yet when I run the search “index=_internal host=DWxxxxxS31 q:\”

I get the below:

05/07/2018
15:38:09.884       
07-05-2018 15:38:09.884 +1000 INFO  Metrics - group=per_source_thruput, series="q:\iib_log\customerdominoconsumer\activatedeactivateclient\mqsiarchive\20180410_234012_541048_customerdominoconsumer_activatedeactivateclient_soaprequest.log", kbps=0.009649, eps=0.063136, kb=0.305664, ev=2, avg_age=3682772.500000, max_age=7365545
host =     DWxxxxxS31 source =        C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log sourcetype =                splunkd
05/07/2018
15:38:09.884       
07-05-2018 15:38:09.884 +1000 INFO  Metrics - group=per_source_thruput, series="q:\iib_log\customerdominoconsumer\activatedeactivateclient\mqsiarchive\20180410_214256_061062_customerdominoconsumer_activatedeactivateclient_soaprequest.log", kbps=0.009649, eps=0.063136, kb=0.305664, ev=2, avg_age=3687826.500000, max_age=7375653
host =     DWxxxxxS31 source =        C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log sourcetype =                splunkd
05/07/2018
15:37:39.025       
07-05-2018 15:37:39.025 +1000 INFO  TailingProcessor - Adding watch on path: Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive.
host =     DWxxxxxS31 source =        C:\Program Files\SplunkUniversalForwarder\var\log\splunk\splunkd.log sourcetype =                splunkd
05/07/2018
15:37:39.025       
07-05-2018 15:37:39.025 +1000 INFO  TailingProcessor - Parsing configuration stanza: monitor://Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\*.log.
host =     DWxxxxxS31 source =        C:\Program Files\SplunkUniversalForwarder\var\log\splunk\splunkd.log sourcetype =                splunkd

which to me suggests Splunk has only processed 2 of the 3 files, which also fits with the fact that when I search the sourcetype for the input, I’m only getting 2 results.

Looking for any tips / suggestions that might help me trouble shoot this issue.

0 Karma
Highlighted

Re: Why is my file not being indexed?

Influencer

Do these files have headers? If the first 256 bytes of the file are the same as the others it will get skipped, and in xml files that can be pretty common. Splunk will not consider the filename as a differentiator by default.

You can use the inputs.conf setting crcSalt to make splunk use the filenames as differentiators In your case

 [monitor://Q:\IIB_Log\CustomerDominoConsumer\ActivateDeactivateClient\mqsiarchive\*.log]
crcSalt = <SOURCE>
disabled = false
index = hprg_applog_nonprod
 sourcetype = hprg:iib:xml

View solution in original post

Highlighted

Re: Why is my file not being indexed?

Path Finder

Thanks jplumsdaine22,

On closer inspection, I actually determined that the contents of 2 of the files was identical so tried adding crcSalt = to my inputs.conf - however it didn't seem to make any difference and still only indexed 2 of the files.

0 Karma
Highlighted

Re: Why is my file not being indexed?

Path Finder

So......

Seems there was a couple of issues at play here, with a syntax error being the one to stump me for a while.

Thanks jplumsdaine22 for your input - only one to offer anything on the topic. Seems you were 100% on the money.

After modifying the inputs to include crcSalt = , rather than crcSalt = , everything is working as expected.

0 Karma