I have events that start with a format similar to:
14-Jul-17,7:23:00 PM,7:23:36 PM,7:23:36 PM,-36,206
where the first field is the date and the fourth field is the time - the middle two time fields are always different - makes me wish you could use regex in time_format. Hoping to avoid datetime.xml, but maybe I'm missing something simple here? Appreciate any ideas!
indexed_extractions=csv makes pretty easy work of it with:
[ csv_timestampfu ] CHARSET=UTF-8 INDEXED_EXTRACTIONS=csv KV_MODE=none SHOULD_LINEMERGE=false category=Custom description=Comma-separated value format, using TIMESTAMP_FIELDS to craft _time disabled=false pulldown_type=true FIELD_NAMES=1,2,3,4,5,6 # set your schema TIME_FORMAT=%d-%b-%y %I:%M:%S TIMESTAMP_FIELDS=1,4 # parse from field 1 & 4 as per time_format
Could probably eval at search time as well if necessary.
Thats what I thought too, its how I was originally approaching, but couldn't get it to work so I was looking for another route. Here's my current props:
[telemetry1] CHARSET=UTF-8 INDEXED_EXTRACTIONS = csv KV_MODE = none SHOULD_LINEMERGE = false pulldown_type = true category = Custom FIELD_NAMES = calendar_day, scheduled_time, actual_arrival_time, actual_depart_time, adherence_seconds, time_point_id TIMESTAMP_FIELDS = calendar_day, actual_depart_time TIME_FORMAT = %d-%b-%y %I:%M:%S %p
But ts recognition is failing and Splunk is just using index time. SA-eventgen is the datasource if that makes a difference, events are coming in over STDOUT. I'll keep poking, but any other thoughts are appreciated - thanks!
This is all (original file, sa-eventgen, conf files etc) on a single instance. The extractions don't happen on the streaming events from the eventgen like they do on the file input and that seems to be the root of the issue - even though I'm using the same sourcetype and event format (minus the header row that was in the original csv file). I'm stumped!
I'm wondering if indexed field extractions occur after timestamping on streaming data for some reason, which was why I was headed down the path of treating the events as raw data and using time_format or datetime.xml. Any other thoughts let me know and thanks so much for the help!
Here's my full btool dump for the sourcetype if it helps:
[telemetry1] ANNOTATE_PUNCT = True AUTO_KV_JSON = true BREAK_ONLY_BEFORE = BREAK_ONLY_BEFORE_DATE = True CHARSET = UTF-8 DATETIME_CONFIG = EVAL-lat = substr(latitude, 1, 2).".".substr(latitude, 3, 7) EVAL-lon = substr(longitude, 1, 3).".".substr(longitude, 4, 7) FIELD_NAMES = calendar_day, scheduled_time, actual_arrival_time, actual_depart_time, adherence_seconds, time_point_id, time_point_name, block_stop_order, vehicle_num, geo_node, latitude, longitude, schd_distance, route_abbr, is_layover, route_name, isrevenue, revenue_id HEADER_MODE = INDEXED_EXTRACTIONS = csv KV_MODE = none LEARN_MODEL = true LEARN_SOURCETYPE = true LINE_BREAKER_LOOKBEHIND = 100 MATCH_LIMIT = 100000 MAX_DAYS_AGO = 2000 MAX_DAYS_HENCE = 2 MAX_DIFF_SECS_AGO = 3600 MAX_DIFF_SECS_HENCE = 604800 MAX_EVENTS = 256 MAX_TIMESTAMP_LOOKAHEAD = 128 MUST_BREAK_AFTER = MUST_NOT_BREAK_AFTER = MUST_NOT_BREAK_BEFORE = NO_BINARY_CHECK = true SEGMENTATION = indexing SEGMENTATION-all = full SEGMENTATION-inner = inner SEGMENTATION-outer = outer SEGMENTATION-raw = none SEGMENTATION-standard = standard SHOULD_LINEMERGE = false TIMESTAMP_FIELDS = calendar_day, actual_depart_time TIME_FORMAT = %d-%b-%y %I:%M:%S %p TRANSFORMS = TRUNCATE = 10000 category = Structured description = MARTA AVL Import detect_trailing_nulls = false disabled = false maxDist = 100 priority = pulldown_type = true sourcetype =