Getting Data In

How to avoid reindexing files after setting crcSalt=

Path Finder

I've came across an issue where my monitored files are not all indexed and I came to know that this is because they start with similar long headers. Upon research, I was introduced to setting crcSalt= as a solution. However, setting this will reindex the files (its contents) that are already indexed.

The file I am indexing gets updated with new entries at different time intervals. How do I address this to avoid reindexing old entries of the file and only index new lines/events? I am seeing ignoreOlderThan and check_method to deal with this. Can anyone suggest the best way for my scenario? Thanks in advance.

Scenario:
system1.log - 5 events
system2.log - 5 events
system3.log - 5 events
system4.log - 5 events

Due to similar headers( I need to index all), only below are indexed.
system1.log - 5 events
system3.log - 5 events

Incorporating crcSalt= as a fix
system1.log - 10 events
system2.log - 5 events
system3.log - 10 events
system4.log - 5 events

How do I avoid reindexing already indexed files to have the result below, and continously index new events when the below data are updated?
system1.log - 5 events
system2.log - 5 events
system3.log - 5 events
system4.log - 5 events

Appreciate Responses.

Ultra Champion

-- I came to know that this is because they start with similar long headers. Upon research, I was introduced to setting crcSalt= as a solution.

For long headers initCrcLen is the solution and not crcSalt=<SOURCE>. crcSalt=<SOURCE> jeopardizes the entire Splunk algorithm and we need to be careful using it ; - )

Apparently, these two are quite often being confused and in our place it became pretty messy due to this confusion.

SplunkTrust
SplunkTrust

Wish I could upvote this three or four times.

Ultra Champion

you are too kind @dwaddle ; -) and you made my day.

0 Karma

Path Finder

Thank you for the response, @ddrillic. Will test this parameter in non-prod and update this trail. Will incorporating initCrcLength would not encounter an issue in reindexing already indexed events?

0 Karma

Path Finder

i have tried this, but i had an issue in reindexing already indexed files as well.

0 Karma

Ultra Champion

So, are you ok now?

0 Karma

SplunkTrust
SplunkTrust

You might have to use the combination of ignoreOlderThan and followTail
Refer to answer: https://answers.splunk.com/answers/508545/how-to-avoid-indexing-events-twice-when-applying-c.html

Refer to Splunk Documentation: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf#MONITOR:

If your existing logs have already been indexed, you can also remove them prior to turning on crcSalt=<SOURCE>. Please test out your inputs conf changes in Non-Production system first to ensure data being indexed is not getting duplicated.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Path Finder

Thank you @niketnilay.
I am encountering a scenario though upon testing.
All files are continuously updated over time. But incorporating ignoreOlderThan will ignore the files, and will not index new events, unless the forwarders are restarted.

Are there any other ways to address avoiding reindexing already indexed files?

0 Karma

SplunkTrust
SplunkTrust

@arielpconsolacion, One of the crude ways would be to do this crcSALT change during maintenance window. First move the files already indexed to a different location (not being monitored by Splunk). Then apply the crcSALT change. Not sure if this is feasible in your case.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Path Finder

Thank you again for this suggestion @niketnilay. The files however are rolling log files so I think this wont be efficient.

0 Karma