Getting Data In

Setting _time when field is inconsistent in Json ingestion

cherrypick
Path Finder

I have a Json file which contains a "date" field. The date field in my data that can either be of format %Y-%m-%d %H:%M:%S (e.g. 2025-05-23 9:35:35 PM) or %Y-%m-%d (e.g. 2025-05-23). The only way to ingest this Json is via manual ingestion.

When trying to set the _time field on ingest, setting the timestamp format to %Y-%m-%d %H:%M:%S will fail and default to the wrong _time value for date fields with format %Y-%m-%d. However, setting timestamp to format %Y-%m-%d won't capture the HMS part. Was there a way to coalesce these so that it will check if HMS is present, and if so, then apply %Y-%m-%d %H:%M:%S format? Or is there a workaround so at least the data ingestion for _time is accurate?

Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @cherrypick 

This sounds like a job for INGEST_EVAL - There are great examples at https://github.com/silkyrich/ingest_eval_examples/blob/master/default/transforms.conf from Rich Morgan @ Splunk.

However for your specific example, the following config should hopefully work, this works by checking the time format first before then setting as required:

== props.conf ==
[yourSourceType]
TRANSFORMS-setCustomTime = setJSONTime

== transform.conf ==
[setJSONTime]
INGEST_EVAL = _time=if(match(date, "\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2} [AP]M"), strptime(date, "%Y-%m-%d %I:%M:%S %p"), strptime(date, "%Y-%m-%d"))

 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

isoutamo
SplunkTrust
SplunkTrust
You should use := instead of = ! Otherwise you will get mv field.
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Interestingly I did consider this as I posted it and decided not to do := based on the examples of setting _time from Rich’s examples at https://github.com/silkyrich/ingest_eval_examples/blob/master/default/transforms.conf which only use = (but oddly only on _time fields), I wonder why this is the case? I might check but I thought it wasn’t possible to set a multi value field for _time (and _raw) using ingest eval?

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Neither I have check it with _time field if it have some internal limitations for mv fields or not, but other fields this is needed.
0 Karma

cherrypick
Path Finder

Again, this does not work because this filters events during search time and not using shared time picker. 

Let's say I have an event that has been indexed with date 2025-04-01 but I ingest it so that _time is 2025-04-02 (so date and _time is mismatched), if I use a timechart command to filter alerts over 2025-04-01, this event will not appear on the timechart because it is first filtered on _time. Even if I specify timechart by date, this event will not appear. 

My core issue is how to ensure _time and date fields are the same in the index (NOT SEARCH TIME) when ingesting data with mismatched formats.

0 Karma

cherrypick
Path Finder

I need this to be done at ingest and not during search. The reason is that Splunk first filters on _time so not having the correct _time values will filter out results that shouldn't.

cherrypick_0-1747967670995.png

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
You could use INGEST_EVAL with @kiran_panchavat ‘s example.
Put it in transforms.conf like

INGEST_EVAL = _time := if(match(date, "\\d{4}-\\d{2}-\\d{2} \\d{1,2}:\\d{2}:\\d{2} [APMapm]{2}"), strptime(date, "%Y-%m-%d %I:%M:%S %p"), strptime(date, "%Y-%m-%d"))
0 Karma

PrewinThomas
Motivator

@cherrypick 

Splunk cannot natively parse multiple timestamp formats for the same field at index time—it only allows a single TIME_FORMAT per sourcetype.

If you can preprocess or route events differently, you can assign different sourcetypes based on the date format

# props.conf
[test_json]
TRANSFORMS-set_sourcetype = set_sourcetype_datetime, set_sourcetype_dateonly

[test_json_datetime]
TIME_PREFIX = "date":\s*"
TIME_FORMAT = %Y-%m-%d %I:%M:%S %p

[test_json_dateonly]
TIME_PREFIX = "date":\s*"
TIME_FORMAT = %Y-%m-%d

# transforms.conf
[set_sourcetype_datetime]
REGEX = "date":\s*"\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2} [AP]M"
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::test_json_datetime

[set_sourcetype_dateonly]
REGEX = "date":\s*"\d{4}-\d{2}-\d{2}"
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::test_json_dateonly


Regards,
Prewin 
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos. Thanks!

 

0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@cherrypick 

Then you can try this 

props.conf 

[json_splunk]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)
TIME_PREFIX = "date":\s*"
TIME_FORMAT = %Y-%m-%d %I:%M:%S %p 
MAX_TIMESTAMP_LOOKAHEAD = 60
TRANSFORMS-normalize = fix_date_field, fix_time_hour

transforms.conf

[fix_date_field]
REGEX = ("date":\s*")(\d{4}-\d{2}-\d{2}|\d{2}-\d{2}-\d{2})(")
FORMAT = $1$2 12:00:00 AM$3
DEST_KEY = _raw

[fix_time_hour]
REGEX = ("date":\s*".*?\s)(\d{1})(:\d{2}:\d{2}\s(?:AM|PM))
FORMAT = $10$2$3
DEST_KEY = _raw

output:

kiran_panchavat_0-1747973360323.png

Sample events which i tried:

{"date": "2025-05-23 9:35:35 PM", "event": "Login"}
{"date": "2025-05-23", "event": "Logout"}
{"date": "2025-05-24 10:15:00 AM", "event": "Login"}
{"date": "2025-05-24", "event": "Logout"}
{"date": "2025-05-25 11:45:00 AM", "event": "Update"}
{"date": "2025-05-25", "event": "Login"}
{"date": "2025-05-26 12:00:00 PM", "event": "Logout"}
{"date": "2025-05-26", "event": "Update"}
{"date": "2025-05-27 1:30:00 PM", "event": "Login"}
{"date": "2025-05-27", "event": "Logout"}

 

 

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@cherrypick 

SPL with Dummy Data Using makeresults

| makeresults count=10
| streamstats count as id
| eval raw_json=case(
id=1, "{\"date\": \"2025-05-23 9:35:35 PM\", \"event\": \"Login\"}",
id=2, "{\"date\": \"2025-05-23\", \"event\": \"Logout\"}",
id=3, "{\"date\": \"2025-05-24 10:15:00 AM\", \"event\": \"Login\"}",
id=4, "{\"date\": \"2025-05-24\", \"event\": \"Logout\"}",
id=5, "{\"date\": \"2025-05-25 11:45:00 AM\", \"event\": \"Update\"}",
id=6, "{\"date\": \"2025-05-25\", \"event\": \"Login\"}",
id=7, "{\"date\": \"2025-05-26 12:00:00 PM\", \"event\": \"Logout\"}",
id=8, "{\"date\": \"2025-05-26\", \"event\": \"Update\"}",
id=9, "{\"date\": \"2025-05-27 1:30:00 PM\", \"event\": \"Login\"}",
id=10, "{\"date\": \"2025-05-27\", \"event\": \"Logout\"}"
)
| spath input=raw_json
| eval parsed_time = if(match(date, "\\d{4}-\\d{2}-\\d{2} \\d{1,2}:\\d{2}:\\d{2} [APMapm]{2}"),
strptime(date, "%Y-%m-%d %I:%M:%S %p"),
strptime(date, "%Y-%m-%d"))
| eval _time = parsed_time
| table _time, date, event

kiran_panchavat_3-1747967671263.png

| makeresults count=4
| streamstats count AS row
| eval _raw=case(
row=1, "{\"date\":\"2025-05-23 21:35:35\"}",
row=2, "{\"date\":\"2025-05-22\"}",
row=3, "{\"date\":\"2025-05-21 15:20:00\"}",
row=4, "{\"date\":\"2025-05-20\"}"
)
| spath
| eval _time=if(match(date, "\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}"),
strptime(date, "%Y-%m-%d %H:%M:%S"),
strptime(date, "%Y-%m-%d"))
| table date _time

kiran_panchavat_2-1747967562637.png

 

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma
Get Updates on the Splunk Community!

Update Your SOAR Apps for Python 3.13: What Community Developers Need to Know

To Community SOAR App Developers - we're reaching out with an important update regarding Python 3.9's ...

October Community Champions: A Shoutout to Our Contributors!

As October comes to a close, we want to take a moment to celebrate the people who make the Splunk Community ...

Automatic Discovery Part 2: Setup and Best Practices

In Part 1 of this series, we covered what Automatic Discovery is and why it’s critical for observability at ...