Splunk Search

How to extract a timestamp from line 2 of a CSV and apply it to each line of data in the remainder of file?


We have a particular file of the format:

Field1, Field2, Timestamp field, Field4, Field5, Number of records, Field7
xx, yy, 2015-09-12 14:55:00.666, zz, aa, 2, bb
DataField1, DataField2, DataField3, DataField4
ssss, yyyy, pppp, ffff
ababa, dfdfdf, ghghg, hhhhh

The data we want is from line 3 onward with line 3 being the field names and line 4 onwards being the values. That part is easy. I simply state the header line start on line 3 and it all works (mostly). The catch is the data lines don't contain a timestamp.

The first 2 lines are metadata field headers and metadata values. You will notice the timestamp for the file is contained in the metadata. What I am trying to do is get the timestamp from the metadata line 2 and tack the timestamp onto each data line before indexing.

Any ideas?


If the illustrated file is raw data, at index time, you should try to get Timestamp_field extracted, then extract the rest at search time.  There are a number of ways to achieve that; needless to say, this is a terrible format to put real data.

At search time, this can be used to separate meta data from real data and extract:


| eval data = split(_raw, "
| eval _raw = mvjoin(mvindex(data, 0,1), "
| eval data = mvjoin(mvindex(data, 2, -1), "
| multikv
| rename data AS _raw
| multikv forceheader=1
| fields - _raw linecount


The illustrated sample file will give

ssssyyyyppppffffxxyyzzaabb22015-09-12 14:55:00.666
ababadfdfdfghghghhhhhxxyyzzaabb22015-09-12 14:55:00.666
0 Karma


Have you tried do modify the props.conf?



0 Karma


Yes, I thought of that however there is no timestamp is the data lines (line 4-x). That would be fine if the writing to the log was near realtime as we could just rely of the time of the indexer. However in our case the file will be written to on a server we don't have control over and then passed via secure connection to our server. There is no guarantee the delay between the last event being written and when we receive the file.

Line 2 in each file contains the date and time the file was first written and that would more accurately reflect the timestamp we want to put on each event.

One thing I thought would have worked was to use the PREAMBLE_REGEX= setting to ignore lines 1 and 2 but put a "capture field" into the regex to capture the timestamp into a field ans somhow insert that into each event line. No luck so far

0 Karma

New Member

were you able to figure it out? i'm dealing with a similar issue.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In the last month, the Splunk Threat Research Team (STRT) has had 2 releases of new security content via the ...

Announcing the 1st Round Champion’s Tribute Winners of the Great Resilience Quest

We are happy to announce the 20 lucky questers who are selected to be the first round of Champion's Tribute ...

We’ve Got Education Validation!

Are you feeling it? All the career-boosting benefits of up-skilling with Splunk? It’s not just a feeling, it's ...