Getting Data In

Ingested a year's worth of events in Windows from one source, but why did Splunk not index all logs?

CaptainHook
Communicator

I was indexing a years worth of logs (200+GB) from one source path. Data was indexed, but I am trying to understand why I have missing data in a source that was indexed over a 5 days ago. We have a path that contains one event log for each day of the year (366 logs) that needed to be indexed into Splunk. To do this action, I ran a basic inputs.conf stanza of:

[monitor://D:\LOGS\IIS\abc\W3SVC105105105\2016\u_ex*.log]
disabled = false
followTail = 0
index = abc_applogs
sourcetype = abc_iis_applogs

The data was indexed into Splunk; however, I have been validating the data and I noticed that only 204 out of the 366 log files were actually indexed. The interesting part about that, is the missing data is random. Couple days out of each month and the entire month of December did not get indexed.

The files are all independent log file names and match the u_ex*.log path I indicated in the monitor. Is there any know reason as to why Splunk didn't index all the data? There is not indication as to why logs were skipped.

0 Karma
1 Solution

CaptainHook
Communicator

I ended up changing the sourcetype to isolate these logs from the other IIS logs and similar events from the past year that were already indexed. After reloading the app, all the data indexed correctly. Still not 100% on why this occurred, but happy to see it is fixed.

View solution in original post

0 Karma

CaptainHook
Communicator

I ended up changing the sourcetype to isolate these logs from the other IIS logs and similar events from the past year that were already indexed. After reloading the app, all the data indexed correctly. Still not 100% on why this occurred, but happy to see it is fixed.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Another thing to keep an eye on is the crcSalt in case Splunk thinks some of the files are the same file, but log rolled.

0 Karma

jtacy
Builder

It's not uncommon for data to be indexed but end up with incorrect Splunk timestamps. I would probably use the missing events in December as a test. Does this search over "All Time" return the results you were missing?

index="abc_applogs" sourcetype="abc_iis_applogs" source="D:\LOGS\IIS\abc\W3SVC105105105\2016\u_ex1612*.log"

I might also consider this search (again, over "All Time", and it could take some time to complete):

index="abc_applogs" sourcetype="abc_iis_applogs" "2016-12-*"

If these return events that you weren't seeing when searching by date, it would be interesting to see a sample event and the timestamp that Splunk is applying to them. Couple of ways to avoid this problem:

  1. Use the "iis" sourcetype. This already contains some configuration that will help to prevent incorrect timestamp and linebreak detection. It will also happen to enable indexed field extraction which may or may not be what you want since it might cost you more disk space on the Splunk side, but it should work fine.
  2. Create appropriate sourcetype configuration on the indexer in props.conf. I would probably just copy the "iis" sourcetype from etc/system/default/props.conf so you'd end up with something like this:

props.conf on indexer side:

[abc_iis_applogs]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
INDEXED_EXTRACTIONS = w3c
detect_trailing_nulls = auto

A couple of references that might be useful:
http://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Createsourcetypes
http://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Configuretimestamprecognition#Edit_timestamp_...

Note that any of these changes will require re-indexing the data to take effect.

0 Karma

CaptainHook
Communicator

Thank you jtacy for your response.
I was running the search as "all time" for the original search to make sure I was capturing anything indexed. This is why I am a little confused on how and why it has missed random events. All the logs are following the same format and there is no rhyme or reason for the data to be missed. Unfortunately, due to the amount of data, it is not reasonable to re-index the data at this time.

I am contemplating grabbing just the missing files and adding them to a separate monitor, but that is quite tedious work.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...