Hi.
OK, this question is totally theory, but i came in case of pratical issue on such problem.
So, let's think i have an App that writes the new log file every 00:00 and writes,
I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE.
12/02/2025 00:30:00 log #1
12/02/2025 00:30:01 log #2
12/02/2025 00:30:02 log #3
PREMISE: every day, at 00:00 the log is totally rewritten from 0 byte with same headers.
So, Splunk UF the first day should take the first 256 bytes for CRC and should take the log entry since it's new for it and send to indexers.
Next day it should block it, thinking it's the same as the day before, since the 256b CRC it's the same, so i should find something like in UF log,
File will not be read, is too small to match seekptr checksum [...]
And find only the entries of yesterday.
Now i force an "initCrcLen = 1024", and now i should start on indexing since the file has, for example,
I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE.
AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG.
12/02/2025 00:30:00 log #1
12/02/2025 00:30:01 log #2
12/02/2025 00:30:02 log #3
The next day, initCrcLen to 1024 bytes wil be the same, UF tags the log as already sent yesterday and blocks it again.
OK, now i force a new "crcSalt = <SOURCE>" to the input.
But in few days both
initCrcLen = 1024 [same header size]
crcSalt = <SOURCE> [same file path]
will calculate a same identical CRC, so the UF will think the log is still the same and it will be blocked! Am i wrong?
I know the question is an old and always discussed question. But it's interesting to know if it's possibile to tell UF: "read the file without thinking about it's CRC, if i have still indexed it, read it, do anything other, and send to indexers!" 😉😉😉
The logs posted in main thread were totally a test, to simplify the real problem.
The issue came out with a user who can't read his logs one day.
And the problem was that for the same 2 days the Application at 00:00 starts writing on 2 nodes the same identical log errors for about 4k, without any timestamp at all, not even a dot was different from day 1 to day 2 from the first default 256 initCrcLen 😎 first time it happened, months ago, i raised initCrcLen to 1024 and injection went ok, but last day i discovered that the "header" errors were about 4k, so, the 3rd day the logs were stuck, since UF thought it was the same identical file already injected! But it weren't since from 10k about Apps start to write json logs with correct timestamps. In this single case i raise again initCrcLen over the 4k and all was ok.
Next days, i can have a 10k identical error header over the same logs. So... in this special case, if issue still come out, i will raise again initCrcLen to maximum and tell user to fix the problem with a simple wortaround:
sed -i "1i$(date) $RANDOM" /path/lo/logAnd... that's all. SPLUNK can do much, but not all 😆
Anyway, it could be useful in future UF versioning developments, a new inputs.conf parameter in stanza to use OS file timestamp CRC as crcSalt, something like,
crcSalt = <FILE_TIMESTAMP>or a salt with present date/time,
crcSalt = <NOW>👍👍👍
What do you think about it?
The logs posted in main thread were totally a test, to simplify the real problem.
The issue came out with a user who can't read his logs one day.
And the problem was that for the same 2 days the Application at 00:00 starts writing on 2 nodes the same identical log errors for about 4k, without any timestamp at all, not even a dot was different from day 1 to day 2 from the first default 256 initCrcLen 😎 first time it happened, months ago, i raised initCrcLen to 1024 and injection went ok, but last day i discovered that the "header" errors were about 4k, so, the 3rd day the logs were stuck, since UF thought it was the same identical file already injected! But it weren't since from 10k about Apps start to write json logs with correct timestamps. In this single case i raise again initCrcLen over the 4k and all was ok.
Next days, i can have a 10k identical error header over the same logs. So... in this special case, if issue still come out, i will raise again initCrcLen to maximum and tell user to fix the problem with a simple wortaround:
sed -i "1i$(date) $RANDOM" /path/lo/logAnd... that's all. SPLUNK can do much, but not all 😆
Anyway, it could be useful in future UF versioning developments, a new inputs.conf parameter in stanza to use OS file timestamp CRC as crcSalt, something like,
crcSalt = <FILE_TIMESTAMP>or a salt with present date/time,
crcSalt = <NOW>👍👍👍
What do you think about it?
I think you're having some serious problems on the source side and you're trying to "fix" them with some ugly hacks "outside".
It's the main application that in some cases gets crazy and start writing so many error headers ☹️ i still talk to developers and tell this is the why, and in case the manual how-to to unlock logs 🤧
There is a limit to everything 😂
Yes with identical headers and identical file paths, CRC collisions will happen even if you increase initCrcLen or set crcSalt.
So I think, the best solution is to rotate logs with unique filenames per day or add unique content in the header so CRC changes.
Regards,
Prewin
🌟If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!
Hi @verbal_666
If the filename is the same each day and the first <x> chars of the log are the same each day then you need to set the initCrcLength to be greater than the number of non-unique characters between days.
e.g. if the header is 1000 chars and the next line is a date like 2025-12-03T12:34:56 then you probably want to include the length of the date string too.
All the crcSalt value of '<source>' does is append the filename to CRC calculation, so it'd be your "initCrcLength" number of chars followed by the filename. Ultimately if the initCrcLength value is too small then it wont detect the change, I dont think the salt value really makes any difference in this scenario?
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
I don't understand. What's happening between those writes? Is it getting deleted and recreated? But why new entries appended to old content?
Anyway the only solution was to raise the initCrcLength to a high value over the header 👍