Getting Data In

Let's think about an unique way to always and always read a log file

verbal_666
Builder

Hi.
OK, this question is totally theory, but i came in case of pratical issue on such problem.

So, let's think i have an App that writes the new log file every 00:00 and writes,

I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE.

12/02/2025 00:30:00 log #1
12/02/2025 00:30:01 log #2
12/02/2025 00:30:02 log #3

 

PREMISE: every day, at 00:00 the log is totally rewritten from 0 byte with same headers.

 

So, Splunk UF the first day should take the first 256 bytes for CRC and should take the log entry since it's new for it and send to indexers.

Next day it should block it, thinking it's the same as the day before, since the 256b CRC it's the same, so i should find something like in UF log,

File will not be read, is too small to match seekptr checksum [...]

 

And find only the entries of yesterday.

Now i force an "initCrcLen = 1024", and now i should start on indexing since the file has, for example,

I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE. I'M STARTING TO WRITE.

AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG. AND NOW I START TO LOG.

12/02/2025 00:30:00 log #1
12/02/2025 00:30:01 log #2
12/02/2025 00:30:02 log #3

 

The next day, initCrcLen to 1024 bytes wil be the same, UF tags the log as already sent yesterday and blocks it again.

OK, now i force a new "crcSalt = <SOURCE>" to the input.

But in few days both

initCrcLen = 1024 [same header size]
crcSalt = <SOURCE> [same file path]


will calculate a same identical CRC, so the UF will think the log is still the same and it will be blocked! Am i wrong?

I know the question is an old and always discussed question. But it's interesting to know if it's possibile to tell UF: "read the file without thinking about it's CRC, if i have still indexed it, read it, do anything other, and send to indexers!" 😉😉😉

Labels (2)
1 Solution

verbal_666
Builder

The logs posted in main thread were totally a test, to simplify the real problem.

The issue came out with a user who can't read his logs one day.

And the problem was that for the same 2 days the Application at 00:00 starts writing on 2 nodes the same identical log errors for about 4k, without any timestamp at all, not even a dot was different from day 1 to day 2 from the first default 256 initCrcLen 😎 first time it happened, months ago, i raised initCrcLen to 1024 and injection went ok, but last day i discovered that the "header" errors were about 4k, so, the 3rd day the logs were stuck, since UF thought it was the same identical file already injected! But it weren't since from 10k about Apps start to write json logs with correct timestamps. In this single case i raise again initCrcLen over the 4k and all was ok.

Next days, i can have a 10k identical error header over the same logs. So... in this special case, if issue still come out, i will raise again initCrcLen to maximum and tell user to fix the problem with a simple wortaround:

sed -i "1i$(date) $RANDOM" /path/lo/log

And... that's all. SPLUNK can do much, but not all 😆

Anyway, it could be useful in future UF versioning developments, a new inputs.conf parameter in stanza to use OS file timestamp CRC as crcSalt, something like,

crcSalt = <FILE_TIMESTAMP>

or a salt with present date/time,

crcSalt = <NOW>

👍👍👍

What do you think about it?

View solution in original post

verbal_666
Builder

The logs posted in main thread were totally a test, to simplify the real problem.

The issue came out with a user who can't read his logs one day.

And the problem was that for the same 2 days the Application at 00:00 starts writing on 2 nodes the same identical log errors for about 4k, without any timestamp at all, not even a dot was different from day 1 to day 2 from the first default 256 initCrcLen 😎 first time it happened, months ago, i raised initCrcLen to 1024 and injection went ok, but last day i discovered that the "header" errors were about 4k, so, the 3rd day the logs were stuck, since UF thought it was the same identical file already injected! But it weren't since from 10k about Apps start to write json logs with correct timestamps. In this single case i raise again initCrcLen over the 4k and all was ok.

Next days, i can have a 10k identical error header over the same logs. So... in this special case, if issue still come out, i will raise again initCrcLen to maximum and tell user to fix the problem with a simple wortaround:

sed -i "1i$(date) $RANDOM" /path/lo/log

And... that's all. SPLUNK can do much, but not all 😆

Anyway, it could be useful in future UF versioning developments, a new inputs.conf parameter in stanza to use OS file timestamp CRC as crcSalt, something like,

crcSalt = <FILE_TIMESTAMP>

or a salt with present date/time,

crcSalt = <NOW>

👍👍👍

What do you think about it?

PickleRick
SplunkTrust
SplunkTrust

I think you're having some serious problems on the source side and you're trying to "fix" them with some ugly hacks "outside".

0 Karma

verbal_666
Builder

It's the main application that in some cases gets crazy and start writing so many error headers ☹️ i still talk to developers and tell this is the why, and in case the manual how-to to unlock logs 🤧

There is a limit to everything 😂

0 Karma

PrewinThomas
Motivator

@verbal_666 

Yes with identical headers and identical file paths, CRC collisions will happen even if you increase initCrcLen or set crcSalt.
So I think, the best solution is to rotate logs with unique filenames per day or add unique content in the header so CRC changes.

Regards,
Prewin
🌟If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @verbal_666 

If the filename is the same each day and the first <x> chars of the log are the same each day then you need to set the initCrcLength to be greater than the number of non-unique characters between days.

e.g. if the header is 1000 chars and the next line is a date like 2025-12-03T12:34:56 then you probably want to include the length of the date string too.

All the crcSalt value of '<source>' does is append the filename to CRC calculation, so it'd be your "initCrcLength" number of chars followed by the filename. Ultimately if the initCrcLength value is too small then it wont detect the change, I dont think the salt value really makes any difference in this scenario?

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I don't understand. What's happening between those writes? Is it getting deleted and recreated? But why new entries appended to old content?

0 Karma

verbal_666
Builder

Anyway the only solution was to raise the initCrcLength to a high value over the header 👍

0 Karma
Get Updates on the Splunk Community!

Your Guide to Splunk Digital Experience Monitoring

A flawless digital experience isn't just an advantage, it's key to customer loyalty and business success. But ...

Data Management Digest – November 2025

  Welcome to the inaugural edition of Data Management Digest! As your trusted partner in data innovation, the ...

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...