I have an input that writes timestamps as the number of milliseconds passed since January 1st 1601 that sadly cannot be changed to either human-readable or a Unix timestamp.
For example, 12995561169293 corresponds to October 24th 2012, 14:06:09. Splunk interprets this as a Unix timestamp, treating the last four digits as milliseconds and 100 microseconds: 1299556116.929(3) corresponding to March 8th 2011, 04:48:36.929.
I can convert "my" timestamp into a Unix timestamp by substracting a constant with an external preprocessing application before loading a file into Splunk. However, I'd prefer it if I could teach Splunk to understand it directly.
The usual sed/regex-transformations at index time cannot do maths to subtract the offset, is there any other way to do the conversion within Splunk?
A regex will not be able to do subtractions for you.
It seems that the only method is to use a scripted input that will parse the events before indexing.
You can set TZ=+NumberOfHoursToAddHere:NumberOfMinutesToAddHere
in props.conf.
You can also look at a solution using Cribl:
https://www.cribl.io/
Do you have a working example using TZ
?
Just under six years later, 7.2 promises a fix \o/
http://docs.splunk.com/Documentation/Splunk/7.2.0/Admin/transformsconf
INGEST_EVAL = <comma-separated list of evaluator expressions>
* NOTE: This setting is only valid for index-time field extractions.
* Optional. When you set INGEST_EVAL, this setting overrides all of the other
index-time settings (such as REGEX, DEST_KEY, etc) and declares the
index-time extraction to be evaluator-based.
* The expression takes a similar format to the search-time "|eval" command.
For example "a=b+c*d" Just like the search-time operator, you can
string multiple expressions together, separated by commas like
"len=length(_raw), length_category=floor(log(len,2))".
* Keys which are commonly used with DEST_KEY or SOURCE_KEY (like
"_raw", "queue", etc) can be used directly in the expression.
Also available are values which would be populated by default when
this event is searched ("source", "sourcetype", "host", "splunk_server",
"linecount", "index"). Search-time calculated fields (the "EVAL-" settings
in props.conf) are NOT available.
* When INGEST_EVAL accesses the "_time" variable, subsecond information is
included. This is unlike regular-expression-based index-time extractions,
where "_time" values are limited to whole seconds.
...
A regex will not be able to do subtractions for you.
It seems that the only method is to use a scripted input that will parse the events before indexing.
Using scripted inputs to do the conversion means I need to re-implement the handling of log rotations and correct tailing after restarts, right?
I was hoping to get around that with some kind of more-powerful-than-sed pre-processing at index time.