Getting Data In

Remove Multiline Log Duplicates

jerrad
Path Finder

I am trying to figure out an approach to a multiline log file problem I have, the device that generates the file does so like a regular running log file however it is FIFO at the point that it reaches 10MB. The only way I can get this file is via FTP and have Splunk monitor the download path, I managed to get all my multiline event breaking working correctly for the most part aside from a few stray events that are truncated from the source but I can live with that. The issue I have is that if I simply overwrite the file with a newly downloaded copy it duplicates many events since the first 256 bytes of the file has a different CRC than before and so does the last 256 bytes of the file. It's really much of the middle portion that is potentially the same .

Is there any tweak or method anyone can suggest to deal with this situation with the goal of not indexing any duplicate events?

First FTP of File Example

MSCi      MSS01                     2010-12-08  09:43:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:44:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:45:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:46:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:47:09
40+ random lines
END OF REPORT

Second FTP of File ~1 hour later

MSCi      MSS01                     2010-12-08  10:43:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:47:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:46:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:45:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:44:09

Thanks

Jerrad

Tags (2)
0 Karma

BenAveling
Path Finder

Rather than have splunk index the ftp'd file, you could perhaps have a script running after each ftp to extract just unique events into a new file and have splunk monitor that.

0 Karma
Get Updates on the Splunk Community!

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...