Getting Data In

Remove Multiline Log Duplicates

jerrad
Path Finder

I am trying to figure out an approach to a multiline log file problem I have, the device that generates the file does so like a regular running log file however it is FIFO at the point that it reaches 10MB. The only way I can get this file is via FTP and have Splunk monitor the download path, I managed to get all my multiline event breaking working correctly for the most part aside from a few stray events that are truncated from the source but I can live with that. The issue I have is that if I simply overwrite the file with a newly downloaded copy it duplicates many events since the first 256 bytes of the file has a different CRC than before and so does the last 256 bytes of the file. It's really much of the middle portion that is potentially the same .

Is there any tweak or method anyone can suggest to deal with this situation with the goal of not indexing any duplicate events?

First FTP of File Example

MSCi      MSS01                     2010-12-08  09:43:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:44:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:45:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:46:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:47:09
40+ random lines
END OF REPORT

Second FTP of File ~1 hour later

MSCi      MSS01                     2010-12-08  10:43:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:47:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:46:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:45:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:44:09

Thanks

Jerrad

Tags (2)
0 Karma

BenAveling
Path Finder

Rather than have splunk index the ftp'd file, you could perhaps have a script running after each ftp to extract just unique events into a new file and have splunk monitor that.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...