I'm running into a strange issue where Splunk is using the current time for a HTTP Event Collector input rather than pulling out the timestamp field I've defined in props.conf. I started by cloning the _json sourcetype and made a few adjustments as event parsing and field extraction were working as expected. I've tried using both the TIMESTAMP_FIELDS and TIME_PREFIX in props.conf without any luck. I'm using a python script to query the Github API and I'm then passing the JSON to splunk_handler.
Payload
[SplunkHandler DEBUG] Sending payload: {"event": "[{\"created_at\": \"2019-01-18T15:24:13Z\", \"pr_user\": \"userid123\", \"merged_at\": \"2019-01-18T15:24:51Z\", \"pr_url\": \"https://github.com/someorganization/somerepo/pull/12345\", \"pr_number\": 12345, \"repo_name\": \"somerepo\"}, {\"created_at\": \"2019-01-18T14:56:27Z\", \"pr_user\": \"userid123\", \"merged_at\": \"2019-01-18T15:09:42Z\", \"pr_url\": \"https://github.com/someorganization/somerepo/pull/12346\", \"pr_number\": 12346, \"repo_name\": \"somerepo\"}]", "host": "myhost", "index": "prmetrics", "source": "test", "sourcetype": "json-github"}
Raw Event Text (as shown in Splunk)
{"created_at": "2019-01-17T21:20:55Z", "pr_user": "userid123", "merged_at": "2019-01-18T14:10:37Z", "pr_url": "https://github.com/someorganization/somerepo/pull/12345", "pr_number": 12345, "repo_name": "somerepo"}
props.conf
[json-github]
INDEXED_EXTRACTIONS = json
KV_MODE = none
NO_BINARY_CHECK = true
disabled = false
SHOULD_LINEMERGE = false
TIME_PREFIX = \{\"created_at\"\:\s\"
MAX_TIMESTAMP_LOOKAHEAD = 50
#TIMESTAMP_FIELDS = created_at
Hi,
As far as I know you need to supply timestamp while formatting your event with sourcetype, source and host for HEC event endpoint but if you want to extract timestamp from your raw data then I guess /collector/event
HEC endpoint will not work instead you need to use /collector/raw
HEC endpoint
I have tested sample data which you have provided in my lab and it didn't extracted timestamp from raw data with /collector/event
HEC endpoint but it worked using /collector/raw
HEC endpoint.
I have used curl to ingest data in Splunk using HEC raw endpoint.
curl -vk "https://localhost:8088/services/collector/raw?channel=A1B2C34D-12A3-1234-A123-12ABC1234567&sourcetype=json-github&source=test&host=myhost&index=main" -H "Authorization: Splunk 1ab23cd64-a12b-123a-1ab2-123ab4c56d78" -d '[{"created_at": "2019-01-18T15:24:13Z", "pr_user": "userid123", "merged_at": "2019-01-18T15:24:51Z", "pr_url": "https://github.com/someorganization/somerepo/pull/12345", "pr_number": 12345, "repo_name": "somerepo"}, {"created_at": "2019-01-18T14:56:27Z", "pr_user": "userid123", "merged_at": "2019-01-18T15:09:42Z", "pr_url": "https://github.com/someorganization/somerepo/pull/12346", "pr_number": 12346, "repo_name": "somerepo"}]'
and props.conf
[json-github]
INDEXED_EXTRACTIONS = json
TIMESTAMP_FIELDS = created_at
There is a solution for Splunk > 7.2 : INGEST_EVAL
props.conf
[json-github]
...
DATETIME_CONFIG = CURRENT
TRANSFORMS-get-date = construct_date
transforms.conf
[construct_date]
INGEST_EVAL=_time=strptime(substr(_raw,17,20),"%Y-%m-%dT%H:%M:%SZ")
This works for HEC-event and HEC-raw endpoint!
For further information look at: https://conf.splunk.com/files/2020/slides/PLA1154C.pdf
I ended up using the following thanks to this tip. Works flawlessly:
INGEST_EVAL=_time=strptime(spath(_raw,"timestamp"), "%Y-%m-%dT%H:%M:%S%3N%z")
This worked for me too
Which HEC endpoint are you using (raw|event)? I assume you’re using the event endpoint based on your post. I do not believe Splunk will let you overwrite the time field when you use that endpoint. If you want to set/override the time field dynamically you will need to use the raw endpoint. While not tested, I assume you cannot overwrite any of the meta data fields when using the event endpoint based on the timestamp override issue you are experiencing.
I’m not sure why this is the desired behavior. It would be nice to get clarification from a Splunk HEC dev.
Hi,
As far as I know you need to supply timestamp while formatting your event with sourcetype, source and host for HEC event endpoint but if you want to extract timestamp from your raw data then I guess /collector/event
HEC endpoint will not work instead you need to use /collector/raw
HEC endpoint
I have tested sample data which you have provided in my lab and it didn't extracted timestamp from raw data with /collector/event
HEC endpoint but it worked using /collector/raw
HEC endpoint.
I have used curl to ingest data in Splunk using HEC raw endpoint.
curl -vk "https://localhost:8088/services/collector/raw?channel=A1B2C34D-12A3-1234-A123-12ABC1234567&sourcetype=json-github&source=test&host=myhost&index=main" -H "Authorization: Splunk 1ab23cd64-a12b-123a-1ab2-123ab4c56d78" -d '[{"created_at": "2019-01-18T15:24:13Z", "pr_user": "userid123", "merged_at": "2019-01-18T15:24:51Z", "pr_url": "https://github.com/someorganization/somerepo/pull/12345", "pr_number": 12345, "repo_name": "somerepo"}, {"created_at": "2019-01-18T14:56:27Z", "pr_user": "userid123", "merged_at": "2019-01-18T15:09:42Z", "pr_url": "https://github.com/someorganization/somerepo/pull/12346", "pr_number": 12346, "repo_name": "somerepo"}]'
and props.conf
[json-github]
INDEXED_EXTRACTIONS = json
TIMESTAMP_FIELDS = created_at
@Kieffer87 : did you try using TIME_FORMAT in your configs...
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %Y-%m-%dT%H:%M:%S%Z
Yes, and I get the same result, the timestamp defaults to the time the data is received.
what's your workflow, do yo have this config on you Splunk-HEC server..??
Running python script from my machine. Have a single instance of Splunk running HEC/Indexing/Search.