Splunk Search

How to extract a timestamp from line 2 of a CSV and apply it to each line of data in the remainder of file?

andyfromoz
Explorer

We have a particular file of the format:

Field1, Field2, Timestamp field, Field4, Field5, Number of records, Field7
xx, yy, 2015-09-12 14:55:00.666, zz, aa, 2, bb
DataField1, DataField2, DataField3, DataField4
ssss, yyyy, pppp, ffff
ababa, dfdfdf, ghghg, hhhhh

The data we want is from line 3 onward with line 3 being the field names and line 4 onwards being the values. That part is easy. I simply state the header line start on line 3 and it all works (mostly). The catch is the data lines don't contain a timestamp.

The first 2 lines are metadata field headers and metadata values. You will notice the timestamp for the file is contained in the metadata. What I am trying to do is get the timestamp from the metadata line 2 and tack the timestamp onto each data line before indexing.

Any ideas?

yuanliu
SplunkTrust
SplunkTrust

If the illustrated file is raw data, at index time, you should try to get Timestamp_field extracted, then extract the rest at search time.  There are a number of ways to achieve that; needless to say, this is a terrible format to put real data.

At search time, this can be used to separate meta data from real data and extract:

 

| eval data = split(_raw, "
")
| eval _raw = mvjoin(mvindex(data, 0,1), "
")
| eval data = mvjoin(mvindex(data, 2, -1), "
")
| multikv
| rename data AS _raw
| multikv forceheader=1
| fields - _raw linecount

 

The illustrated sample file will give

DataField1DataField2DataField3DataField4Field1Field2Field4Field5Field7Number_of_recordsTimestamp_field
ssssyyyyppppffffxxyyzzaabb22015-09-12 14:55:00.666
ababadfdfdfghghghhhhhxxyyzzaabb22015-09-12 14:55:00.666
0 Karma

lukas_loder
Communicator

Have you tried do modify the props.conf?

HEADER_FIELD_LINE_NUMBER=3

http://docs.splunk.com/Documentation/Splunk/6.0/Data/Extractfieldsfromfileheadersatindextime

0 Karma

andyfromoz
Explorer

Yes, I thought of that however there is no timestamp is the data lines (line 4-x). That would be fine if the writing to the log was near realtime as we could just rely of the time of the indexer. However in our case the file will be written to on a server we don't have control over and then passed via secure connection to our server. There is no guarantee the delay between the last event being written and when we receive the file.

Line 2 in each file contains the date and time the file was first written and that would more accurately reflect the timestamp we want to put on each event.

One thing I thought would have worked was to use the PREAMBLE_REGEX= setting to ignore lines 1 and 2 but put a "capture field" into the regex to capture the timestamp into a field ans somhow insert that into each event line. No luck so far

0 Karma

shivangisaini
New Member

were you able to figure it out? i'm dealing with a similar issue.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...