Getting Data In

Index files in order of timestamp or record file timestamp as a field

leecaf
Explorer

I'm indexing a bunch of CSV files provided by an external vendor over ftp ( mapped or synched to my local drive ) there may be duplicate rows across different files. the requirement is to take the row from the file with the latest timestamp. I can achieve this by either:

a) ensuring that the order in which splunk indexes my data is in the same order of the file timstamps. can someone suggest how I can do this without having to rewrite in a script the entire 'scan directory for updated files' logic that splunk nicely provides?

b) Can I add an extra field 'fileTimeStamp'? how would I specify this into my props.conf?

c) lookup the file timestamps as a 'lookup' at search time. but if a file is newly updated at search time, but it has not been indexed yet, I may see misleading results.

suggestions please?

Tags (3)
0 Karma

mataharry
Communicator

No you cannot selectively ask splunk to monitor a part of a file, or the order of them.

A) the simple solution is a dedup in the events.
source=mypath/to/my/folder/* | dedup _raw

see http://docs.splunk.com/Documentation/Splunk/5.0.3/SearchReference/dedup

B ) No. the mod time of the file is not indexed. The closest you have is the _indextime (when the events is received at the indexer)

A solution is to index all and to use the timestamp of the events:

source=mypath/to/my/folder/* | stats latest(_raw) AS _raw by source

or the indextime

source=mypath/to/my/folder/* | eval oldtime=_time | eval _time=_indextime | stats latest(oldtime) AS oldtime latest(_raw) AS _raw by source

C) use the _indextime for the same purpose.

0 Karma
Get Updates on the Splunk Community!

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Your Next Big Security Credential: No Prerequisites Needed We know you’ve got the skills, and now, earning the ...

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

This is the sixth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Answers Content Calendar, July Edition I

Hello Community! Welcome to another month of Community Content Calendar series! For the month of July, we will ...