Solved: Testing sourcetype with sample data formats _time ...

loganramirez · ‎06-24-2022

I am using a HEC and configured a custom source type that sets _time based on a field in the JSON data and when using the "add data" sample data, it works great. _time gets updated, however, when actually sending data to the HEC, _time stays at indexed time (not the _time based on the data).

To give the concrete example, in the JSON i have this line:
"timestampStr": "2022-06-03 19:38:19.736995059",

And built this sourcetype:

[_j_son_logan_test]
DATETIME_CONFIG =
LINE_BREAKER = \}()\{
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
disabled = false
BREAK_ONLY_BEFORE_DATE =
SHOULD_LINEMERGE = false
TIME_PREFIX = \"timestampStr\": \"
TIME_FORMAT =
KV_MODE = json
INDEXED_EXTRACTIONS = json

And when using the Settings --> Add Data option, and selecting that Source Type, _time shows as 2022-06-03 19:38:19.736995059

However, when I sent that json blob via curl to the HEC (which is set to a particular index and to use that sourcetype), the _time value shows the time it was index (i.e. right now (2022-06-24)).

In looking at the data itself, (index="my_index"), the sourcetype column shows _j_son_logan_test

Not sure what to check next, but open to thoughts and thank you!

loganramirez · ‎06-27-2022

So the dual flags issue wasn't the issue, but I did find (from the article you linked!) that I needed to send to the raw endpoint, and that works!

For those (noobies like me!) this means changing the URL to

curl -k https://ipaddress:8088/services/collector/raw

Instead of the

curl -k https://ipaddress:8088/services/collector/

I was sending to.

View solution in original post

richgalloway · ‎06-24-2022

See if this answer helps: https://community.splunk.com/t5/Getting-Data-In/Defining-Timestamp-for-HEC-Input/m-p/413425

Also, it's not advised to specify both KV_MODE=json and INDEXED_EXTRACTIONS=json as it's been said to result in double the field extractions.

---
If this reply helps you, Karma would be appreciated.

loganramirez · ‎06-27-2022

So the dual flags issue wasn't the issue, but I did find (from the article you linked!) that I needed to send to the raw endpoint, and that works!

For those (noobies like me!) this means changing the URL to

curl -k https://ipaddress:8088/services/collector/raw

Instead of the

curl -k https://ipaddress:8088/services/collector/

I was sending to.

PickleRick · ‎06-27-2022

But.

If you're explicitly sending to a HEC endpoint you should know what your timestamp is. So it's easier for the inder/HF to not have to parse the timestamp out of the raw event. You can simply supply it with your event and be done with it. It also speeds up ingestion since you don't have to waste time for timestamp extraction.

Think about it.

loganramirez · ‎06-27-2022

While I am confident you are right, I do not know what " You can simply supply it with your event " means, and so I'm stuck extracting it (unless you have more specifics with what 'simply supply it' means)?

THANK YOU!

PickleRick · ‎06-27-2022

If you do a REST API request to HEC /collector (or /collector/event) endpoint, you're providing an event along with possible other fields (index, sourcetype, source, time) as well as custom indexed fields.

https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_met...

You can set your payload to include time value (as epoch timestamp with miliseconds). This way you have an absolute timestamp, you have no issues with timezone parsing and so on. I do that on regular basis.

loganramirez · ‎06-27-2022

Thanks! Looking at that document and when you say 'payload' that is the actual json message coming in, ya? So does that mean if we alter our JSON:

"event":{ "resourceId": "enum:172.17.2.238", "timestamp":"1654285099736"}

to this

"event":{ "resourceId": "enum:172.17.2.238", "time":"1654285099736"}

It will 'read' it naturally/natively?

Also, note, that our devs send epoch in MS (not the <sec>.<ms>) format specified in the doc you sent, so we may have to request that change as well.

Thank you! This is great!

PickleRick · ‎06-27-2022

Close. You either send it as text event and time or json structure and time.

So you can either send it as (full HEC payload):

{
   "event": { 
       "resourceId": "enum:172.17.2.238",
       "another_field": "another_value",
       "and_so_on": "and_so_on
       },
   "time":  1654285099.736
}

Or

{
     "event": "{\"resourceID\": \"enum:172.17.2.238\"  [...] }",
     "time" :  1654285099.736
}

If your software generates json anyway, it's of course more convenient to supply the former part - with event json data simply embedded within the "event" field.

The second form is useful mostly when you're forwarding a pre-formatted data from another system or something like that.

loganramirez · ‎06-27-2022

wow this is great! had NO IDEA about this and this is a really big help. thank you for the specific example, sir!

MuS · ‎06-24-2022

Hi loganramirez,

Start with the obvious and check splunkd.log for errors like truncation for the HEC input and if not done already restart the HEC instance; new parsing configs most likely require a restart to be applied. Check with btool if your props.conf is really applied and not gets 'overwritten' by other settings. Check for typos in the sourcetype and case matching 😉

Also try another tool like nc to send the test event just to rule out that it's not curl related.

Hope this helps ...

cheers, MuS

Testing sourcetype with sample data formats _time correctly, but when actually using it at index time, it does not work

HTTP Event Collector

JSON

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!