All...
Looking to see if anyone has any thoughts on trying to bring in different timestamp formats inside of the same sourcetype. I am working on an issue where we are bringing Crowdstrike data where they are just dumping data into S3 bucket. Some of the data comes into buckets that have specific directories, so I can set sourcetyping at the source level for those: However we have some data coming into the same bucket and the same file, but they may have different formats. Examples of what we are seeing:
"modified_time":"2022-01-10T23:58:25.865570789Z"
"timestamp":"2022-01-21T20:37:37Z"
We have tried defining a datetime.xml and have used the following props settings:
[crowdstrike:edr]
LINE_BREAKER = ([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
#TIME_FORMAT = %s%3N
TIME_PREFIX = "timestamp":|"modified_time":|"_time":|"Time":
#TIME_PREFIX = timestamp
DATETIME_CONFIG = /etc/apps/fmac_crowdstrike_props/datetime.xml
TRANSFORMS-filter-edr-splunkd = crowdstrike_filter_splunk,crowdstrike_filter_splunkforwarder,crowdstrike_filter_endofprocess
TRUNCATE = 999999
disabled = false
kv_mode = json
Please let me know if you have any thoughts on this or ideas that will help. Thanks!
Right... so I read the article and felt like this might be a good solution. I have implemented this on our testing box, but now the events are getting stamped with the index time. It seems like the DATETIME_CONFIG=CURRENT is winning, and that the transforms are not doing what I am expecting. Here are the props and transform that I am using below, but maybe I am missing something:
Props:
[crowdstrike:edr]
DATETIME_CONFIG = CURRENT
LINE_BREAKER = ([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
TIME_PREFIX = \"timestamp\":|\"modified_time\":|\"_time\":|\"Time\":
TRUNCATE = 999999
disabled = false
kv_mode = json
TRANSFORMS-extract_date = multiple_timestamp_format
transforms:
[multiple_timestamp_format]
INGEST_EVAL= _time=case(isnotnull(strptime(_raw, "%Y-%m-%dT%H:%M:%S.%QZ")), strptime(_raw, "%Y-%m-%dT%H:%M:%S.%QZ"), isnotnull(strptime(_raw,"%s%3N")), strptime(_raw, "%s%3N"))
Just let me know what you think... Thanks!
Just for clarity I'd lose the isnotnull and use coalsece instead. It's more readable that way.
Also if your parsing is not working (and thus you're getting index time into them), you can add some constant "fallback" at the end so it always matches and see if it's because the EVAL maches wrongly or is it that it's not run at all.
Like
INGEST_EVAL= _time=coalesce(strptime(_raw, "%Y-%m-%dT%H:%M:%S.%QZ"), strptime(_raw, "%Y-%m-%dT%H:%M:%S.%QZ"), strptime(_raw,"%s%3N"), strptime(_raw, "%s%3N"), 1)
This way if none of the strptime produces a non-null result, your event should get indexed in 1970 🙂
Right... according to the .conf presentation, you are supposed to set the DATETIME_CONFIG = CURRENT, which is what I tried. I also commented out the DATETIME_CONFIG to see if that would help, but no luck there. I can try setting the DATETIME_CONFIG="" to see what that gets, but not sure that gets me what I am looking for. Will let you know...
Yup. If I remember correctly, date parsing is relatively early on the processing queue so you can't modify the message prior to timestamp extraction. Therefore you can only modify it "post-mortem" 😉 with ingest-time eval.
You want this pdf - https://conf.splunk.com/files/2020/slides/PLA1154C.pdf
Especially, pages 26+