Getting Data In

Index files in order of timestamp or record file timestamp as a field

leecaf
Explorer

I'm indexing a bunch of CSV files provided by an external vendor over ftp ( mapped or synched to my local drive ) there may be duplicate rows across different files. the requirement is to take the row from the file with the latest timestamp. I can achieve this by either:

a) ensuring that the order in which splunk indexes my data is in the same order of the file timstamps. can someone suggest how I can do this without having to rewrite in a script the entire 'scan directory for updated files' logic that splunk nicely provides?

b) Can I add an extra field 'fileTimeStamp'? how would I specify this into my props.conf?

c) lookup the file timestamps as a 'lookup' at search time. but if a file is newly updated at search time, but it has not been indexed yet, I may see misleading results.

suggestions please?

Tags (3)
0 Karma

mataharry
Communicator

No you cannot selectively ask splunk to monitor a part of a file, or the order of them.

A) the simple solution is a dedup in the events.
source=mypath/to/my/folder/* | dedup _raw

see http://docs.splunk.com/Documentation/Splunk/5.0.3/SearchReference/dedup

B ) No. the mod time of the file is not indexed. The closest you have is the _indextime (when the events is received at the indexer)

A solution is to index all and to use the timestamp of the events:

source=mypath/to/my/folder/* | stats latest(_raw) AS _raw by source

or the indextime

source=mypath/to/my/folder/* | eval oldtime=_time | eval _time=_indextime | stats latest(oldtime) AS oldtime latest(_raw) AS _raw by source

C) use the _indextime for the same purpose.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud's AI Assistant in Action Series: Auditing Compliance and ...

This is the third post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...