How to extract a timestamp from line 2 of a CSV an...

andyfromoz · ‎10-01-2015

We have a particular file of the format:

Field1, Field2, Timestamp field, Field4, Field5, Number of records, Field7
xx, yy, 2015-09-12 14:55:00.666, zz, aa, 2, bb
DataField1, DataField2, DataField3, DataField4
ssss, yyyy, pppp, ffff
ababa, dfdfdf, ghghg, hhhhh

The data we want is from line 3 onward with line 3 being the field names and line 4 onwards being the values. That part is easy. I simply state the header line start on line 3 and it all works (mostly). The catch is the data lines don't contain a timestamp.

The first 2 lines are metadata field headers and metadata values. You will notice the timestamp for the file is contained in the metadata. What I am trying to do is get the timestamp from the metadata line 2 and tack the timestamp onto each data line before indexing.

Any ideas?

yuanliu · ‎01-28-2023

If the illustrated file is raw data, at index time, you should try to get Timestamp_field extracted, then extract the rest at search time. There are a number of ways to achieve that; needless to say, this is a terrible format to put real data.

At search time, this can be used to separate meta data from real data and extract:

| eval data = split(_raw, "
")
| eval _raw = mvjoin(mvindex(data, 0,1), "
")
| eval data = mvjoin(mvindex(data, 2, -1), "
")
| multikv
| rename data AS _raw
| multikv forceheader=1
| fields - _raw linecount

The illustrated sample file will give

DataField1	DataField2	DataField3	DataField4	Field1	Field2	Field4	Field5	Field7	Number_of_records	Timestamp_field
ssss	yyyy	pppp	ffff	xx	yy	zz	aa	bb	2	2015-09-12 14:55:00.666
ababa	dfdfdf	ghghg	hhhhh	xx	yy	zz	aa	bb	2	2015-09-12 14:55:00.666

lukas_loder · ‎10-02-2015

Have you tried do modify the props.conf?

HEADER_FIELD_LINE_NUMBER=3

http://docs.splunk.com/Documentation/Splunk/6.0/Data/Extractfieldsfromfileheadersatindextime

andyfromoz · ‎10-05-2015

Yes, I thought of that however there is no timestamp is the data lines (line 4-x). That would be fine if the writing to the log was near realtime as we could just rely of the time of the indexer. However in our case the file will be written to on a server we don't have control over and then passed via secure connection to our server. There is no guarantee the delay between the last event being written and when we receive the file.

Line 2 in each file contains the date and time the file was first written and that would more accurately reflect the timestamp we want to put on each event.

One thing I thought would have worked was to use the PREAMBLE_REGEX= setting to ignore lines 1 and 2 but put a "capture field" into the regex to capture the timestamp into a field ans somhow insert that into each event line. No luck so far

shivangisaini · ‎01-27-2023

were you able to figure it out? i'm dealing with a similar issue.

How to extract a timestamp from line 2 of a CSV and apply it to each line of data in the remainder of file?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!