Hello,
My problem is simple to explain: I have an app that generates logs that are written whenever a new action is performed.
The problem is, when the session is over, the first line of that log is changed to include the close time of the session, which makes splunk REINDEX everything on the log.
Any ideas?
Thanks
Are you indexing binary logos looking for changes to them? Give us some more details and maybe there is another way to go about this. Can you explain the entire scenario etc?
I'm curious if you're looking to index changes to the files, just the file itself (and only once even if there is a change), do you need to compare before and after?, do you need to know if modified
at all?
The scenario:
My app writes a log file for each user session. It writes every action of the user on a .txt file, so it grows until the session is ended by the user. I want to get each written action by time the it is registered in the log file ( something near real-time indexing).
Everyhting is fine until the moment where the session is ended by the user, because the app goest to the first line of the log file and writes there the time and date where the session was ended.
That thing is detected by splunk as a change and it reindexes the whole file again...
So no binary data, just registries coming one after another and an indexer to keep indexing registries one after another... It is quitte simple actually
Yeah it makes sense, best option is going to be to modify the app so it writes the end session timestamp at the bottom of the log file.
Can you show us what the first line of the file looks like before and after the process is completed?
Maybe we can use props to delete events when they start with one or the other.
Open file first line: 2021 2016-04-24 18:28:54 0000-00-00 00:00:00 +0100 00000cf0 001 003f 0001 09
Closed file first line: 2021 2016-04-24 18:28:54 2016-04-24 18:29:31 +0100 00000cf0 001 003f 0001
If the events are being written in near real time you could just use DATETIME_CONFIG=NONE in your props so that Splunk will use the time the file is written as the event date.
I doubt that would work. From Splunk's point of view, a file is unchanged if the start CRC, the end CRC, and the size doesn't change. Adding a session end to the start would inevitably change the size. If the size changes in any context other than appending to the end, Splunk will reindex.
I agree with you, I feel my hands are tied on this because I cant avoid it to check the beggining of the file
Timestamp extraction won't influence reindexing files that have their beginning changed later on.
Yeah I'm trying to edit he could try to decrease Crc init to a lower value possibly too.
From Splunk's point of view, that's intended behaviour. If you're tailing a log file and its start changes, it's considered to be a new log file.
Have your application move the file to a different place after completion, and monitor that different place with Splunk.
Mkay, here's another option: Have the application append the session end time to the end of the log file.
I second this. And third it. All in favor?
Really, though, IMO this is inordinately strange behavior from a logger. There's obviously some business reason or some reason involving an inferior product being worked around that caused the logging to be created this way, and it could be at least worth an ask to see if that behavior can be fixed now. There are quite a few compelling reasons.
I've searched around and there is nothing I can do to avoid the app to write that session end date on the beginning of the log file
Firstly, Thanks for the suggestion. It would avoid a real time monitor of the logo content though.. because splunk would only see the file when it would be full and closed... Nota an óption for our case
Your hands are far from tied here, heck they aren't even dirty yet! 😉
martin_mueller's suggestion of moving this file's data to a different location for monitoring is the "best" option here without changing your app's log behavior and does not stop you from maintaining a "realtime" view of the data.
Your file is fundamentally not fit for direct monitor by splunk without dealing with re-index of data/dedup of duplicate events (which is a completely viable option if license/filesize allow). So now what???
Some factors on how to attack this include:
I would begin with the option of tailing the writes to this file to another file. Depending on the specifics of your app, it isn't all that challenging to write a small script to maintain a copy of this file at all times. This allows you to control the behavior of the data while maintaining a "realtime" look at this file.
Heck, even a cron job that simply checks this file at a tiny interval and does a cp myUglyFile.log >> myBeautifulSplunkFormattedLog.log then cleans up dupe lines before splunk ingestion could work....
It will take some solutioneering, but this is totaly doable..many data sources require a bit of massaging to help splunk spend its time doing what it does best
Hi mmodestino,
Of course I can use scripts and lots of other stuff to get around this. The objective though is to keep everything on splunk side! And this is the point here, it looks like for now I have to change things other than splunk conf files.
So I am using a Splunk HF on WINDOWS to process these files and sends them to an Splunk Indexer on CentOS. I can do explore administrational and scripting skills to solve this.
The files can be opened for hours, can be written every second or be left unchanged for minutes.
Everything can be done right? This is just something I would like to do ONLY on Splunk, without scripts!
Thank you for your suggestions
P.S.: copying thousands of files with many Gb is not a viable option. Neither cutting them
Fair enough...I have heard many people say the same thing...eventually every admin figures out the difference between what you CAN do in splunk and what you SHOULD do...
Before you look to bend splunk to the behaviour of some other app, then i would recommend studying how splunk monitors a file:
http://docs.splunk.com/Documentation/Splunk/6.4.0/Data/Howlogfilerotationishandled
And then I would suggest seeing if you can shrink the crc check on this input to a length that would keep splunks ability to check for unique files, but not check far enough to see what ur app writes to the top of the file on session close
can u share the first 256 bytes of a new file and a closed file??
At the end of the day this is not a logfile and saying "i want to do it on the splunk side" when talking about something splunk doesnt natively do, is going to require more than some conf file tweaks.
I'm definitly gonna check that link.
As you mentioned, the minimum that splunk uses to check if a file has been indexed or not is 256 bytes. The problem is that my app writes a close date on the first line of the log, at the time it actually closes an app session.
Open file first line: 2021 2016-04-24 18:28:54 0000-00-00 00:00:00 +0100 00000cf0 001 003f 0001 09
Closed file first line: 2021 2016-04-24 18:28:54 2016-04-24 18:29:31 +0100 00000cf0 001 003f 0001