Getting Data In

Has anyone come across monitoring an XML logfile that gets completely re-indexed when an event is added? How do you handle this?

wardallen
Path Finder

I need to monitor an application logfile, and have a problem with the default way Splunk "tails" a file. This particular log file doesn't append new rows to the file, it inserts them.

< root >
< a >
< b >
< c >
< /root >
becomes
"< root >
< a >
< b >
< c >
< d >
< /root >"
This causes Splunk to consider it a completely new file, and it reindexes the whole thing, when I only want < d >. Has anyone come across this before and solved it? If so, how?

Tags (3)
1 Solution

lguinn2
Legend

AFAIK, you can't solve this by using Splunk input settings. In Splunk, you have only 2 choices: re-index the whole file each time it changes, or index new data that is appended to the end of the file. There is no way to pick up a new event that appears in the middle of the file.

I think the easiest way to handle this is to write a script that compares the last version of the file with the current version and then outputs the new events to stdout. Use this as a scripted input.

View solution in original post

anewell
Path Finder

Noted for posterity and the search spiders: I have this problem with the XML log files generated by Ipswitch WS_FTP.

The problem presents as a huge number of duplicate events collected, counted against license, and shown in search (millions, in my case) , whereas inspecting the actual log files reveals only only hundreds or thousands of events. Piping the events through "... | dedup " reveals the actual number of unique events.

Confirm the issue by running a search which shows the growing time between the events as-written and the time they were (re)indexed:

... | eval delta=(_time - _indextime)  | timechart avg(delta) span=15m
0 Karma

lguinn2
Legend

AFAIK, you can't solve this by using Splunk input settings. In Splunk, you have only 2 choices: re-index the whole file each time it changes, or index new data that is appended to the end of the file. There is no way to pick up a new event that appears in the middle of the file.

I think the easiest way to handle this is to write a script that compares the last version of the file with the current version and then outputs the new events to stdout. Use this as a scripted input.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

I've solved similar situation by not "logging" that way. The idea of a logfile is to add stuff to the end, not to the middle. Splunk checks the start and former end of a file up to where it left off reading for changes, and if those checks fail the file is (correctly) presumed to be altered instead of appended.

One way would be to log self-contained xml documents with their own root element if you absolutely have to log xml.
Another way would be to log using key-value lines rather than xml nodes inserted in the middle of an xml tree.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.