I am trying to index a somewhat long log file (about 38805 bytes according to the tailing processor).
This log file contains 417 lines, but Splunk only indexed 47 lines.
I thought it might be the TRUNCATE default of 10000 bytes, but looking at the logs, I noticed that it successfully indexed all the log files below 18181 bytes in size (except for one log file that is 4124 bytes, but I'm not sure if that's important)
My log's inputs.conf is configured as such:
[monitor://\\path\to\our\internal\network]
whitelist = WhiteListPattern
initCrcLength = 2048
sourcetype = generic_single_line
disabled = false
Anyone have any idea what's going on here?
The issue here seems to be a combination of several things:
1. The events do not have timestamps.
2. One of the events has a value
of 1319553808 which splunk automatically used as _time
of 10/25/11 6:44:48.000 AM.
3. Once it made this conversion, all following events also received that same timestamp (i'm guessing because the following value
s didn't convert as nicely.
4. All our queries have an automatic window of 30 days, and as such these "old" events weren't picked up.
The issue here seems to be a combination of several things:
1. The events do not have timestamps.
2. One of the events has a value
of 1319553808 which splunk automatically used as _time
of 10/25/11 6:44:48.000 AM.
3. Once it made this conversion, all following events also received that same timestamp (i'm guessing because the following value
s didn't convert as nicely.
4. All our queries have an automatic window of 30 days, and as such these "old" events weren't picked up.
You are ingesting a file using a UNC pathname. While that normally works, I suspect you may be having issues because of that coupled with something in your environment. Could be exact flavor of Windows, perhaps something in your Windows Sharing setup, repeated very tiny network hiccups ... the available ways for that to go wrong are limitless.
To test this, create a temporary index in Splunk and create a temporary folder on your Splunk Indexer. Copy a bunch of those files from their existing location and drop them into that temporary folder. Then create an input just like your existing one (obviously, you'll use a different path!) and see if those get ingested properly.
If there are no problems, I'd recommend installing the Universal Forwarder on the system involved and using that to read the files locally and forward them. I suspect you'll have no problems out of that method.
If there are still problems ingesting the entire file we can continue investigating, but at least we'll have ruled out quite a few things that might have been happening.
FYI, just the past week I ingested a 7 GB log file with no issues. I needed two or three lines out of it but was having problem opening it in any editor and searching, so I decided to ingest all 7 GB and search with Splunk which worked perfectly.
I should note we also have other very large files on the same network share and those ones haven't had any issues. It just seems to only crop up with these specific files.
Compare Splunk events with the log file directly, find what the last thing is that it reads properly and then examine the next few lines - could be something in there.
I've actually figured out the "what" if not the "why". I'll add it as an answer.
When you say long log file, do you mean a long event in a file? If truncate is the issue, you should see truncate messages in your _internal logs.
No, I mean a long file. The actual events lines are rather short.