Hi, I'm facing an issue where the same data gets indexed multiple times every time the JSON file is pulled from the FTP server. Each time the JSON file is retrieved and placed on my local Splunk server, it overwrites the existing file. I don't have control over the content being placed on the FTP server, it could either be an entirely new entry or an existing entry with new data added, as shown below. I'm monitoring a specific file, as its name, type, and path remain consistent. From what I can observe, every time the file has new entries alongside previously indexed data, it is re-indexed, causing duplication. Example: file.json 2024-04-21 14:00 - row 1 2024-04-21 14:10 - row 2 overwritten file.json 2024-04-21 14:00 - row 1 2024-04-21 14:10 - row 2 2024-04-21 14:20 - row 3 Additionally, I checked the sha256sum of the JSON file after it’s pulled into my local Splunk server. The hash value changes before and after the file is overwritten. file.json: 2217ee097b7d77ed4b2eabc695b89e5f30d4e8b85c8cbd261613ce65cda0b851 /home/ws/logs/###.json overwritten file.json: 45b01fabce6f2a75742c192143055d33e5aa28be3d2c3ad324dd2e0af5adf8dd /home/ws/logs//###.json I've tried using initCrcLength, crcSalt, and followTail, but they don't seem to prevent the duplication, as Splunk still indexes it as new data. Any assistance would be appreciated, as I can't seem to prevent the duplication in indexing.
... View more