Getting Data In

Routing to an dynamic index based on JSON field

trenin
Explorer

I have JSON data that I am ingesting. I would like to route the event to an index based on one of the JSON fields. I've seen examples that use REGEX, but I want to avoid hard coding the indexes since I will need to update multiple config files if I start getting new types of data.

My JSON data includes the following section:

...
"collection": {
  "date": "...",
  "source": <Canada | US | Mexico>
},
...

I would like to have 3 seperate indexes, one for Canada, US, and Mexico. I would like to have the index determine dynamically based on the input.

I've seen examples that suggest this is easy to do with REGEX, and I think I could do this as follows that way:

indexes.conf:

[index-Canada]
...
[index-US]
...
[index-Mexico]
...

props.conf:

[default]
TRUNCATE = 0
INDEX_EXTRACTIONS = json
TIMESTAMP_FIELDS = collection.date
TRANSFORMS-SetIndex = setIndex-Canada, setIndex-US, setIndex-Mexico

transforms.conf:

[setIndex-Canada]
REGEX = "source": "Canada"
DEST_KEY = _MetaData::Index
FORMAT = index-Canada

[setIndex-US]
REGEX = "source": "US"
DEST_KEY = _MetaData::Index
FORMAT = index-US

[setIndex-Mexico]
REGEX = "source": "Mexico"
DEST_KEY = _MetaData::Index
FORMAT = index-Mexico

I think this will work. However, I would like to make it so that I don't have to hard code the transforms.conf for each index. One way is to do the following:

props.conf:

[default]
TRUNCATE = 0
INDEX_EXTRACTIONS = json
TIMESTAMP_FIELDS = collection.date
TRANSFORMS-SetIndex = setIndex

transforms.conf:

[setIndex]
REGEX = "source": "(.*)"
DEST_KEY = _MetaData::Index
FORMAT = index-$1

I have a couple questions about this:

  1. If the data has an index I haven't configured, can I somehow setup a fallback so that events that don't match a configured index are not lost?
  2. Can I use the SOURCE_KEY somehow to use the value of the JSON field instead of REGEX? I would rather use the JSON parsing ability of Splunk than my REGEX skills to make sure I am getting the right field. If somehow my REGEX shows up in the contents of the event later, I could get data routed to the wrong index.
0 Karma

amitm05
Builder

For #1
I think you'd need to handle that with some logical set of rules. May be something like defining 2 stanzas in transforms for setting your indexes. One would assign the index only if the sources are US, Mexico OR Canada :

[setIndex_KnownLocations]
REGEX = "source": "Canada|US|Mexico"
DEST_KEY = _MetaData::Index
FORMAT = index-$1

And the second would assign your backup index for all events from other sources :
[setIndex_UnKnownLocations]
REGEX = "source": "(.*)"
DEST_KEY = _MetaData::Index
FORMAT = index-BackupIndex

trenin
Explorer

Thanks - I will try that. Any thoughts for how to use the Splunk JSON parsing in favour of REGEX?

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...