Getting Data In

Routing to an dynamic index based on JSON field

trenin
Explorer

I have JSON data that I am ingesting. I would like to route the event to an index based on one of the JSON fields. I've seen examples that use REGEX, but I want to avoid hard coding the indexes since I will need to update multiple config files if I start getting new types of data.

My JSON data includes the following section:

...
"collection": {
  "date": "...",
  "source": <Canada | US | Mexico>
},
...

I would like to have 3 seperate indexes, one for Canada, US, and Mexico. I would like to have the index determine dynamically based on the input.

I've seen examples that suggest this is easy to do with REGEX, and I think I could do this as follows that way:

indexes.conf:

[index-Canada]
...
[index-US]
...
[index-Mexico]
...

props.conf:

[default]
TRUNCATE = 0
INDEX_EXTRACTIONS = json
TIMESTAMP_FIELDS = collection.date
TRANSFORMS-SetIndex = setIndex-Canada, setIndex-US, setIndex-Mexico

transforms.conf:

[setIndex-Canada]
REGEX = "source": "Canada"
DEST_KEY = _MetaData::Index
FORMAT = index-Canada

[setIndex-US]
REGEX = "source": "US"
DEST_KEY = _MetaData::Index
FORMAT = index-US

[setIndex-Mexico]
REGEX = "source": "Mexico"
DEST_KEY = _MetaData::Index
FORMAT = index-Mexico

I think this will work. However, I would like to make it so that I don't have to hard code the transforms.conf for each index. One way is to do the following:

props.conf:

[default]
TRUNCATE = 0
INDEX_EXTRACTIONS = json
TIMESTAMP_FIELDS = collection.date
TRANSFORMS-SetIndex = setIndex

transforms.conf:

[setIndex]
REGEX = "source": "(.*)"
DEST_KEY = _MetaData::Index
FORMAT = index-$1

I have a couple questions about this:

  1. If the data has an index I haven't configured, can I somehow setup a fallback so that events that don't match a configured index are not lost?
  2. Can I use the SOURCE_KEY somehow to use the value of the JSON field instead of REGEX? I would rather use the JSON parsing ability of Splunk than my REGEX skills to make sure I am getting the right field. If somehow my REGEX shows up in the contents of the event later, I could get data routed to the wrong index.
0 Karma

amitm05
Builder

For #1
I think you'd need to handle that with some logical set of rules. May be something like defining 2 stanzas in transforms for setting your indexes. One would assign the index only if the sources are US, Mexico OR Canada :

[setIndex_KnownLocations]
REGEX = "source": "Canada|US|Mexico"
DEST_KEY = _MetaData::Index
FORMAT = index-$1

And the second would assign your backup index for all events from other sources :
[setIndex_UnKnownLocations]
REGEX = "source": "(.*)"
DEST_KEY = _MetaData::Index
FORMAT = index-BackupIndex

trenin
Explorer

Thanks - I will try that. Any thoughts for how to use the Splunk JSON parsing in favour of REGEX?

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...