Solved: Duplicate, Triplicate , Quadrupled logs in splunk

yamini_37 · ‎09-30-2020

Hi All,

Recently i have integrated one zipped log file. Daily, at a particular time , the log will get updated with few additional log lines. But , in splunk , the logs are ingesting so strangely.

For example, on 25/9/20, the log lines were 10, and in splunk i saw the same log count.
On 26/9/20, additional 5 lines were appended and the log file got updated. Ideally, these 5 lines should ingest into splunk. But that is not happening real case.

REAL BEHAVIOR OF SPLUNK UF agent:
On 25/9/2020, log line count in server is 10, splunk count is 10.

On 26/9/20, 5 lines were added and log line count in server is 15 but in splunk, it reindexed the file again and indexes all the 15 log lines again in splunk. Now the total count since 25/9/2020 is 15+10=25.(ideally, it should contain 15 logs only)

On 27/9/20, 5 lines were added and log line count in server is 20 but in splunk, it reindexed the file again and indexes all the 20 log lines again in splunk. Now the total count since 25/9/2020 is 20+(15+10)=45.(ideally, it should contain 20 logs only)

So, the first indexed log is seen thrice and the latest indexed log is seen once. likewise, daily the ingestion is getting multiplicated.

Here, the log file is not rotating, all the new log lines are appended in the same log file and available in zipped format in the server.

Can someone please help me to fix this ingestion behaviour. Please!!!

Richfez · ‎09-30-2020

That is, unfortunately for your use case, how it works.

https://docs.splunk.com/Documentation/Splunk/latest/Data/Monitorfilesanddirectories#How_Splunk_Enter...

So, I'd suggest the easiest solution might be to rotate the log file, but I'm not sure that will work in your case (timing is everything!)

If the file writing application isn't bothered by no existing log file (e.g. if it finds the file missing, it just creates it, writes a few lines and moves on with life) and if other business requirements are fine with this, you could set it up as a batch input with a move_policy of sinkhole. That would immediately delete the file each time it was read, so it won't re-read old stuff.

Another option, probably the best option, could have a process/script unzip the file into a new location on a periodic basis, and have Splunk tail that unzipped version. You'll have to test a bit, but I believe this should work for you.

Otherwise, sorry, it's just how it works - when a zip file is indexed, its entire contents are indexed.

Happy Splunking!

-Rich

View solution in original post

Richfez · ‎09-30-2020

That is, unfortunately for your use case, how it works.

https://docs.splunk.com/Documentation/Splunk/latest/Data/Monitorfilesanddirectories#How_Splunk_Enter...

So, I'd suggest the easiest solution might be to rotate the log file, but I'm not sure that will work in your case (timing is everything!)

If the file writing application isn't bothered by no existing log file (e.g. if it finds the file missing, it just creates it, writes a few lines and moves on with life) and if other business requirements are fine with this, you could set it up as a batch input with a move_policy of sinkhole. That would immediately delete the file each time it was read, so it won't re-read old stuff.

Another option, probably the best option, could have a process/script unzip the file into a new location on a periodic basis, and have Splunk tail that unzipped version. You'll have to test a bit, but I believe this should work for you.

Otherwise, sorry, it's just how it works - when a zip file is indexed, its entire contents are indexed.

Happy Splunking!

-Rich

yamini_37 · ‎10-02-2020

@Richfez Thanks for the advice. I tried batch inputs method but no luck. I am getting an error in splunkd.log for those batch input paths. Splunkd is throwing an error as it is not an absolute path. Is there any other way to stop this duplication. Please advice

Richfez · ‎10-03-2020

Yes, there are a couple of ways.

The core problem here is very simple, so maybe mentioning it again will bring to mind other ways to fix this. The problem is, that zip files get read and ingested *in their entirety* every time they get read and ingested. That's why you have duplicated data. Because the 30 rows in that zip file now, when the device saves a new copy with another 5 rows in it, those 35 rows ALL get read again and ingested. Because it's a zip file, and that's how it works.

So, you need a solution that makes this zip-file-append-behavior not be an issue. What's right for your environment, or easiest, isn't anything for me to tell you. But here are options.

1) You could unzip the file via some other method (batch file/script), into another folder and have Splunk monitor that other unzipped file in that other folder. Then a standard monitor input will work on that new location/file, because Splunk can generally handle this. Though you might have to play with crcsalt and other similar settings a bit.

2) Or maybe you could try the batch/sinkhole input again, only this time use an absolute path. Then the batch/sinkhole input will delete it each time and there will be no re-reading of the old contents of the file because they won't be there (as far as I know the path requirements between monitor:: and batch:: inputs are identical and you literally shouldn't need to change anything except "monitor" to "batch" and add the move_policy=sinkhole to it, but if it's complaining about absolute paths, well, change it to absolute pathing then.)

3) Reconfigure the sending/saving device to not zip the file it's saving and instead leave it plaintext. If you can do this, then a standard monitor input will probably work on the non-zipped file because monitor inputs are really smart about appended data on "regular" files.

4) Reconfigure the sending/saving device to not append to an existing file when it builds a zip file. Then a standard monitor input will work, because it won't be an appended-to file, but instead "new from empty".

I'm sure there are probably other solutions too.

If you continue to have problems out of the batch/sinkhole input, please post your input stanza here so we can take a look at it - it's probably some other typo not letting you do this.

In any case, do let us know what ended up working!

-Rich

Duplicate, Triplicate , Quadrupled logs in splunk

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Join the Conversation

Duplicate, Triplicate , Quadrupled logs in splunk

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey