Splunk Search

Extract multiple fields without knowing their order in TRANSFORMS

ilya_resh
Engager

Hi,
I need to extract multiple fields (from events that are coming via HEC) and assign an index based on the concatenated values.
(I know that you can assign index per HEC token, but let's assume that all the events are coming with the same token)
Example payload

{
    "sourcetype": "hec:generic",
    "event": {
        "platform": "platform01",
        "service": "service02",
        "env": "npd",
        "type": "alert",
        "test": "true",
        "message": "TEST HEC 040"
    }
}

i've figured out how to do it if I know the order
props.conf

[hec:generic]
TRANSFORMS-index_selector = index_selector

transforms.conf

[index_selector]
REGEX =  platform"\s?:\s?"(?P<platform>\w+)",\n\s*"service"\s?:\s?"(?P<service>\w+)",\n\s*"env"\s?:\s?"(?P<env>\w+)
DEST_KEY = _MetaData:Index
FORMAT   = $1_$2_$3

But what if I don't know the order of the platform , service and env fields?
Any suggestions?
Can I somehow have 3 separate REGEX lines?
Tried below

    REGEX =  platform"\s?:\s?"(?P<platform>\w+)
    REGEX =  service"\s?:\s?"(?P<service>\w+)
    REGEX =  env"\s?:\s?"(?P<env>\w+)
    DEST_KEY = _MetaData:Index
    FORMAT   = $1_$2_$3

and the result it just picks up the last REGEX, so basically it tries to assign "npd_$2_$3" to index

0 Karma

ilya_resh
Engager

I didn't want to have all the JSON fields been indexed.
Here is what I came up with (still will need to asses performance impact)
When sending HEC add fields portion
So now the payload looks like that

{
    "sourcetype": "hec:generic",
    "fields": {
        "platform": "platform01",
        "service": "service02",
        "env": "npd"
    },
    "event": {
        "type": "alert",
        "test": "true",
        "message": "TEST HEC 046"
    }
}

props.conf

[hec:generic]
TRANSFORMS-index_selector = index_selector
TRANSFORMS-remove_hec_meta = meta_remover

transforms.conf

[index_selector]
INGEST_EVAL = index=platform . "_" . service . "_" . env

[meta_remover]
INGEST_EVAL = env:=null(),  service:=null(),  platform:=null()

The meta_remover is optional in case one doesn't want to index fields that were used on selecting the index

0 Karma

to4kawa
Ultra Champion

you JSON is valid

props.conf

[hec:generic]
TRANCATE = 0
DATETIME_CONFIG = CURRENT
SEDCMD-remove = s/^.*?fields\":\s({.*?}).*/\1/
INDEXED_EXTRACTIONS = json
KV_MODE = none
TRANSFORMS-index_selector = index_selector

Splunk default: KV_MODE = auto
json is extracted appropriate.

0 Karma

ilya_resh
Engager

@to4kawa , it is a valid JSON, but KV_MODE is used for search-time field extractions only.
In my case I need to use the JSON fields during indexing to actually decide which index will be used to store the events (but not save them as indexed fields)

0 Karma

to4kawa
Ultra Champion

try

INDEXED_EXTRACTIONS = json
KV_MODE = none
on props.conf

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...