Getting Data In

Splunk Reindexes File that gets a new first line when closed

TiagoMatos
Path Finder

Hello,

My problem is simple to explain: I have an app that generates logs that are written whenever a new action is performed.

The problem is, when the session is over, the first line of that log is changed to include the close time of the session, which makes splunk REINDEX everything on the log.

Any ideas?

Thanks

0 Karma

jkat54
SplunkTrust
SplunkTrust

Are you indexing binary logos looking for changes to them? Give us some more details and maybe there is another way to go about this. Can you explain the entire scenario etc?

0 Karma

jkat54
SplunkTrust
SplunkTrust

I'm curious if you're looking to index changes to the files, just the file itself (and only once even if there is a change), do you need to compare before and after?, do you need to know if modified
at all?

0 Karma

TiagoMatos
Path Finder

The scenario:

My app writes a log file for each user session. It writes every action of the user on a .txt file, so it grows until the session is ended by the user. I want to get each written action by time the it is registered in the log file ( something near real-time indexing).

Everyhting is fine until the moment where the session is ended by the user, because the app goest to the first line of the log file and writes there the time and date where the session was ended.

That thing is detected by splunk as a change and it reindexes the whole file again...

So no binary data, just registries coming one after another and an indexer to keep indexing registries one after another... It is quitte simple actually

0 Karma

jkat54
SplunkTrust
SplunkTrust

Yeah it makes sense, best option is going to be to modify the app so it writes the end session timestamp at the bottom of the log file.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Can you show us what the first line of the file looks like before and after the process is completed?
Maybe we can use props to delete events when they start with one or the other.

0 Karma

TiagoMatos
Path Finder

Open file first line: 2021 2016-04-24 18:28:54 0000-00-00 00:00:00 +0100 00000cf0 001 003f 0001 09

Closed file first line: 2021 2016-04-24 18:28:54 2016-04-24 18:29:31 +0100 00000cf0 001 003f 0001

0 Karma

bentleymi
Engager

If the events are being written in near real time you could just use DATETIME_CONFIG=NONE in your props so that Splunk will use the time the file is written as the event date.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I doubt that would work. From Splunk's point of view, a file is unchanged if the start CRC, the end CRC, and the size doesn't change. Adding a session end to the start would inevitably change the size. If the size changes in any context other than appending to the end, Splunk will reindex.

0 Karma

TiagoMatos
Path Finder

I agree with you, I feel my hands are tied on this because I cant avoid it to check the beggining of the file

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Timestamp extraction won't influence reindexing files that have their beginning changed later on.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Yeah I'm trying to edit he could try to decrease Crc init to a lower value possibly too.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

From Splunk's point of view, that's intended behaviour. If you're tailing a log file and its start changes, it's considered to be a new log file.

Have your application move the file to a different place after completion, and monitor that different place with Splunk.

martin_mueller
SplunkTrust
SplunkTrust

Mkay, here's another option: Have the application append the session end time to the end of the log file.

Richfez
SplunkTrust
SplunkTrust

I second this. And third it. All in favor?

Really, though, IMO this is inordinately strange behavior from a logger. There's obviously some business reason or some reason involving an inferior product being worked around that caused the logging to be created this way, and it could be at least worth an ask to see if that behavior can be fixed now. There are quite a few compelling reasons.

0 Karma

TiagoMatos
Path Finder

I've searched around and there is nothing I can do to avoid the app to write that session end date on the beginning of the log file

0 Karma

TiagoMatos
Path Finder

Firstly, Thanks for the suggestion. It would avoid a real time monitor of the logo content though.. because splunk would only see the file when it would be full and closed... Nota an óption for our case

0 Karma

mattymo
Splunk Employee
Splunk Employee

Your hands are far from tied here, heck they aren't even dirty yet! 😉

martin_mueller's suggestion of moving this file's data to a different location for monitoring is the "best" option here without changing your app's log behavior and does not stop you from maintaining a "realtime" view of the data.

Your file is fundamentally not fit for direct monitor by splunk without dealing with re-index of data/dedup of duplicate events (which is a completely viable option if license/filesize allow). So now what???

Some factors on how to attack this include:

  • What operating system are you dealing with? *nix? windows?
  • What Admin/scripting skills are available to you?
  • How long are these files open and how many do you have at any one time? How big are they?
  • How often are events written to this file?

I would begin with the option of tailing the writes to this file to another file. Depending on the specifics of your app, it isn't all that challenging to write a small script to maintain a copy of this file at all times. This allows you to control the behavior of the data while maintaining a "realtime" look at this file.

Heck, even a cron job that simply checks this file at a tiny interval and does a cp myUglyFile.log >> myBeautifulSplunkFormattedLog.log then cleans up dupe lines before splunk ingestion could work....

It will take some solutioneering, but this is totaly doable..many data sources require a bit of massaging to help splunk spend its time doing what it does best

- MattyMo
0 Karma

TiagoMatos
Path Finder

Hi mmodestino,

Of course I can use scripts and lots of other stuff to get around this. The objective though is to keep everything on splunk side! And this is the point here, it looks like for now I have to change things other than splunk conf files.

So I am using a Splunk HF on WINDOWS to process these files and sends them to an Splunk Indexer on CentOS. I can do explore administrational and scripting skills to solve this.

The files can be opened for hours, can be written every second or be left unchanged for minutes.

Everything can be done right? This is just something I would like to do ONLY on Splunk, without scripts!

Thank you for your suggestions

P.S.: copying thousands of files with many Gb is not a viable option. Neither cutting them

0 Karma

mattymo
Splunk Employee
Splunk Employee

Fair enough...I have heard many people say the same thing...eventually every admin figures out the difference between what you CAN do in splunk and what you SHOULD do...

Before you look to bend splunk to the behaviour of some other app, then i would recommend studying how splunk monitors a file:

http://docs.splunk.com/Documentation/Splunk/6.4.0/Data/Howlogfilerotationishandled

And then I would suggest seeing if you can shrink the crc check on this input to a length that would keep splunks ability to check for unique files, but not check far enough to see what ur app writes to the top of the file on session close

can u share the first 256 bytes of a new file and a closed file??

At the end of the day this is not a logfile and saying "i want to do it on the splunk side" when talking about something splunk doesnt natively do, is going to require more than some conf file tweaks.

- MattyMo
0 Karma

TiagoMatos
Path Finder

I'm definitly gonna check that link.

As you mentioned, the minimum that splunk uses to check if a file has been indexed or not is 256 bytes. The problem is that my app writes a close date on the first line of the log, at the time it actually closes an app session.

Open file first line: 2021 2016-04-24 18:28:54 0000-00-00 00:00:00 +0100 00000cf0 001 003f 0001 09

Closed file first line: 2021 2016-04-24 18:28:54 2016-04-24 18:29:31 +0100 00000cf0 001 003f 0001

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!