Hello all,
We are sending some JSON files using HEC (raw endpoint), where a file contains some metadata at the beginning (see below). We want this metadata to be present in ALL events of said file. Basically, we want to prevent having common data repeated in each event in the JSON.
We already tried creating a regex that extracts some fields, but it will add those fields on one event only, not on all.
The JSONs looks like this:
{
"metadata": {
"job_id": "11234",
"project": "Platform",
"variant": "default",
"date": "26.06.2023"
},
"data":
{
"ID": "1",
"type": "unittest",
"status": "SUCCESS",
"identified": 123
},
{
"ID": "2",
"type": "unittest",
"status": "FAILED",
"identified": 500
},
{
"ID": "3",
"type": "unittest",
"status": "SUCCESS",
"identified": 560
}
}
We want to "inject" the metadata attributes into each event, so we expect to have a table like this:
job_id |
project |
variant |
date |
ID |
type |
status |
identified |
11234 |
Platform |
default |
26.06.2023 |
1 |
unittest |
SUCCESS |
123 |
11234 |
Platform |
default |
26.06.2023 |
2 |
unittest |
FAILED |
500 |
11234 |
Platform |
default |
26.06.2023 |
3 |
unittest |
SUCCESS |
560 |
Metadata Data
Currently we use this configuration in props.conf:
[sepcial_sourcetype]
BREAK_ONLY_BEFORE_DATE =
DATETIME_CONFIG =
LINE_BREAKER = ((?<!"),|[\r\n]+)
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Custom
description = Jenkins Job Configurations
pulldown_type = 1
disabled = false
SEDCMD-removeunwanted1 = s/\{\s*?"metadata(.*\s*)+?}//g
SEDCMD-remove_prefix = s/"data":\s*\[//g
SEDCMD-remove_suffix = s/\]\s*}//g
What should our props.conf and transforms.conf look like to accomplish this?
Even if this splits the events and extracts the fields correctly, it obviously causes the metadata part to be ignored (due to SEDCMD-removeunwanted1). But even without that configuration, the metadata will only be present in its own separate event and not replicated on all events.
Here we saw that it is also not supported to send custom metadata, but that would have been perfect for our use case: https://community.splunk.com/t5/Getting-Data-In/Does-the-HTTP-Event-Collector-API-support-events-wit...
We already have a workaround where we will edit the JSON so that each event contains the metadata, but this is not ideal as will require to preprocess it before sending to Splunk and all events would have repeated data. So we are looking for a solution that could be handled by Splunk directly.
Thanks for any hints!
Hi
I’m afraid that you cannot do this directly with only splunk. Splunk is handling events one by one and it cannot refer to some previous events on ingestion phase.
I think that you could add some e.g. jq preprocessing before send it to splunk?
Another option could be use e.g. Cribl to do that preprocessing?
r. Ismo
@isoutamoThanks for your quick answer! Even if it's not possible to reference back events during ingestion, what about creating custom metadata fields, similar to sourcetype, which can be set once and its replicated in all events?
You should remember that adding metadata to event means that it's an indexed field and it increase the size of tsidx files. It there are lot of different values for those metadata fields then I would think couple of time is this something I really need!
I haven't try this by myself, but basically you could use transforms to add _meta field to event. Usually this has done on forwarder in inputs.conf just adding "_meta = xyz" . But there should be an option to do this with transforms.conf and maybe this works also with HEC with raw event? You could look e.g. https://conf.splunk.com/files/2020/slides/PLA1154C.pdf where is some discussion about it. But probably this also needs that you replicate those metadata fields to all data events? Or if you have only couple of those, then you maybe could use some hard coding for it (probably not working idea)?
For conclusion: I don't think that this metadata approach will work.
Still I see the best option to duplicate that metadata to all data events for do it.
Thanks a lot for the information. We also saw the metadata option but what happens is that it will only add the metadata to one event.
My understanding is that Splunk first splits the file into events using the LINE_BREAKER, and only then applies the field extractions (just my guess as this is not really explained anywhere), so we did not manage to extract something that is applied to all the events. But we do think that this is something that should be allowed to configure somewhere.