Getting Data In

How can we monitor CSV file ?

ajitshukla61116
Path Finder

Hello splunker,

we have one test case in which we have to monitor one csv file(1K records) for any change. If we add any row or update any thing for nnumber of times then also this file need to be ingested in splunk index. Please help me to find the solution of this test case.

0 Karma

PavelP
Motivator

Hello @ajitshukla61116 ,

I know a partial solution for your question - use initCrcLength = 1048576, splunk will calculate CRC sum based on the first 1MB and reindex the whole csv file if anything changed in the first 1048576 bytes of the file.

[monitor:///tmp/file.csv]
disabled = false
....
initCrcLength = 1048576

here is an excerpt from the documentation:

initCrcLength = <integer>
* How much of a file, in bytes, that the input reads before trying to
  identify whether it is a file that has already been seen. You might want to
  adjust this if you have many files with common headers (comment headers,
  long CSV headers, etc) and recurring filenames.
* Cannot be less than 256 or more than 1048576.
* CAUTION: Improper use of this setting causes data to be re-indexed. You
  might want to consult with Splunk Support before adjusting this value - the
  default is fine for most installations.
* Default: 256 (bytes).
0 Karma

ajitshukla61116
Path Finder

@PavelP thanks for this solution , One problem with this approach is that splunk will re ingest the complete file then ingest the changed row. Is there any way to stop re-ingesting the complete file?
looking for your comments on this

0 Karma

PavelP
Motivator

@ajitshukla61116 ,

one of your requirements is "update any thing" - does it also means any line anywhere in the file, also at the beginning of file? If yes, then you have to reindex the whole file

0 Karma

ajitshukla61116
Path Finder

@PavelP yes ,"update any thing" - update any row or even any filed value in the file not only beginning or end of file.
one more query -; is there any way we can delete previous records before re ingesting updated file?because every time when file is updated records are increasing in splunk index which is not looking the optimal solution.

0 Karma

PavelP
Motivator
  • you can use a script which compare old and new version of the file and write the diff to a log, then let splunk monitor this log file.
  • another option is to use a database instead of csv file and let splunk monitor it for updates
  • you have to consider that splunk will not see if the record in the csv file is deleted

regarding your question "is there any way we can delete previous records before re ingesting updated file": there is no easy way to delete events from splunk once events are indexed.

  • do records in the csv file are with a timestamp?
  • can you output records sorted by time, so new events are appended to the csv file instead of overwriting?
0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!