Getting Data In

Remove Multiline Log Duplicates

jerrad
Path Finder

I am trying to figure out an approach to a multiline log file problem I have, the device that generates the file does so like a regular running log file however it is FIFO at the point that it reaches 10MB. The only way I can get this file is via FTP and have Splunk monitor the download path, I managed to get all my multiline event breaking working correctly for the most part aside from a few stray events that are truncated from the source but I can live with that. The issue I have is that if I simply overwrite the file with a newly downloaded copy it duplicates many events since the first 256 bytes of the file has a different CRC than before and so does the last 256 bytes of the file. It's really much of the middle portion that is potentially the same .

Is there any tweak or method anyone can suggest to deal with this situation with the goal of not indexing any duplicate events?

First FTP of File Example

MSCi      MSS01                     2010-12-08  09:43:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:44:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:45:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:46:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:47:09
40+ random lines
END OF REPORT

Second FTP of File ~1 hour later

MSCi      MSS01                     2010-12-08  10:43:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:47:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:46:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:45:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:44:09

Thanks

Jerrad

Tags (2)
0 Karma

BenAveling
Path Finder

Rather than have splunk index the ftp'd file, you could perhaps have a script running after each ftp to extract just unique events into a new file and have splunk monitor that.

0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...