We have a particular file of the format:
Field1, Field2, Timestamp field, Field4, Field5, Number of records, Field7
xx, yy, 2015-09-12 14:55:00.666, zz, aa, 2, bb
DataField1, DataField2, DataField3, DataField4
ssss, yyyy, pppp, ffff
ababa, dfdfdf, ghghg, hhhhh
The data we want is from line 3 onward with line 3 being the field names and line 4 onwards being the values. That part is easy. I simply state the header line start on line 3 and it all works (mostly). The catch is the data lines don't contain a timestamp.
The first 2 lines are metadata field headers and metadata values. You will notice the timestamp for the file is contained in the metadata. What I am trying to do is get the timestamp from the metadata line 2 and tack the timestamp onto each data line before indexing.
Any ideas?
If the illustrated file is raw data, at index time, you should try to get Timestamp_field extracted, then extract the rest at search time. There are a number of ways to achieve that; needless to say, this is a terrible format to put real data.
At search time, this can be used to separate meta data from real data and extract:
| eval data = split(_raw, "
")
| eval _raw = mvjoin(mvindex(data, 0,1), "
")
| eval data = mvjoin(mvindex(data, 2, -1), "
")
| multikv
| rename data AS _raw
| multikv forceheader=1
| fields - _raw linecount
The illustrated sample file will give
DataField1 | DataField2 | DataField3 | DataField4 | Field1 | Field2 | Field4 | Field5 | Field7 | Number_of_records | Timestamp_field |
ssss | yyyy | pppp | ffff | xx | yy | zz | aa | bb | 2 | 2015-09-12 14:55:00.666 |
ababa | dfdfdf | ghghg | hhhhh | xx | yy | zz | aa | bb | 2 | 2015-09-12 14:55:00.666 |
Have you tried do modify the props.conf?
HEADER_FIELD_LINE_NUMBER=3
http://docs.splunk.com/Documentation/Splunk/6.0/Data/Extractfieldsfromfileheadersatindextime
Yes, I thought of that however there is no timestamp is the data lines (line 4-x). That would be fine if the writing to the log was near realtime as we could just rely of the time of the indexer. However in our case the file will be written to on a server we don't have control over and then passed via secure connection to our server. There is no guarantee the delay between the last event being written and when we receive the file.
Line 2 in each file contains the date and time the file was first written and that would more accurately reflect the timestamp we want to put on each event.
One thing I thought would have worked was to use the PREAMBLE_REGEX= setting to ignore lines 1 and 2 but put a "capture field" into the regex to capture the timestamp into a field ans somhow insert that into each event line. No luck so far
were you able to figure it out? i'm dealing with a similar issue.