My csv data contains a number of timestamps. I want the timestamp field to be conditional on the result of another field, say field 12. So if I compare field 12 in each row for "X" then I want to use the timestamp from field 15th and if field 12 contains "Y", I want to use field 10 for the timestamp.
Currently I am splitting my log file into two based on the field 12 value and then defining 2 different sourcetypes, each with a different TIME_PREFIX value in props.conf, but wondered if I could avoid that step and have a single sourcetype.
Following Ayn and Kristian Kolb's comments, here's more detail
These are the two standard lines of data
Course A,_scorm12_ar123,Custom Asset,username1,First,Last,Branch Sales,Group Id1,Group Path,0,2013-11-07,2013-11-07,2,0,,,In Progress,,,,35,14,Direct sales
Course B,_scorm12_cf411,Custom Asset,username2,First,Last,Group name,Group Id2,Group Path2,0,2013-11-16,2013-11-20,3,0,,2013-11-17,Completed,,100.00,100.00,35,62,Indirect sales
The first is a course in progress (field 17) and started on 2013-11-07 (field 11)
The second is a course completed (field 17) and started on 2013-11-16 (field 11) and completed on 2013-11-17 (field 16).
I tried two TIME_PREFIX settings
TIME_PREFIX=^(((\"[^\"]*\",)|([^,]*,)){15}|((\"[^\"]*\",)|([^,]*,)){10})
TIME_PREFIX=^(((\"[^\"]*\",)|([^,]*,)){10}|((\"[^\"]*\",)|([^,]*,)){15})
In the first case, _time is set to the completion date or the 'previous' record's time if the course in In Progress. In the second case, the _time is always set to the start date.
I don't suppose I can look ahead BEYOND the wanted time field to check for the value of field 17 to then extract the time from a field BEFORE...??
I guess if the Completed|In Progress text was before the time field I could include that in the pattern, but the docs for TIME_PREFIX say the time is extracted from the text following the regex match.
Not too sure how Splunk would treat this, but it might work. The (untested) example below assumes that you have space-separated fields, and that the fields themselves do NOT contain any spaces.
TIME_PREFIX = ^((\S+\s+){11}X\s+\S+\s+\S+\s+|(\S+\s+){9}(?=\S+\s+\S+\s+Y))
Also, depending on the size of the events, you may need to increase the MAX_TIMESTAMP_LOOKAHED
.
If this approach does not work, it may be better to use your current strategy, with two different sourcetypes.
For more qualified help, you should consider posting a few sample events.
UPDATE:
Course A,_scorm12_ar123,Custom Asset,username1,First,Last,Branch Sales,Group Id1,Group Path,0,
2013-11-07,2013-11-07,2,0,,,
In Progress,,,,35,14,Direct sales
Course B,_scorm12_cf411,Custom Asset,username2,First,Last,Group name,Group Id2,Group Path2,0,2013-11-16,2013-11-20,3,0,,
2013-11-17,Completed,,100.00,100.00,35,62,Indirect sales
Given that I've understood you correctly, the above date fields are the ones you want, based on the Completed/In Progress status.
I would try this with the following regex in TIME_PREFIX (but note that it may not work at all, I haven't tested it);
TIME_PREFIX = ^(([^,]*,){10}(?=([^,]*,){6}In Progress)|([^,]*,){15}(?=[^,]*,Completed))
Good luck,
/K
EDIT: typo
EDIT2: picked the wrong field in the first event - corrected now.
Did you try the updated TIME_PREFIX reges? It'd be fun to know if it worked or not.
/k
Well, to answer your question on wheter you can 'look beyond' the timestamp for a field value, yes you can. It's called a positive lookahead, and is seen in my example above.
The second regex (\S+\s+){9}(?=\S+\s+\S+\s+Y)
will match a position after 9 sequences (i.e. just before field 10) of non-space-characters-followed-by-space. But only if this is followed by a Y
in field 12.
Btw, I don't really understand your regexes, i.e. they dont make sense to me. And I'm quite sure that you can only specify TIME_PREFIX
once per stanza in props.conf. One of them will be discarded.
/k
Thanks Kristian, see my updated description. I guess I'll have to use my existing strategy, but thanks for the great idea.
good luck.
Thanks, I should be able to map that to my data, so I'll try it in the morning!
If you can specify a TIME_PREFIX
that matches one at a time you can do this. Otherwise the answer is most likely no. But something like
TIME_PREFIX = ...,Y,...,|...,N,...
should work, where you give one regex that matches as far as you want in case the field is "Y" and another one for when the field is "N".
Thanks Ayn, see my updated description. I guess I'll have to use my existing strategy, but thanks for the great idea.
Thanks, I'll try this out in the morning, late here!
I'm too slow by a mile, I see. 🙂