Getting Data In

Routing to an dynamic index based on JSON field

trenin
Explorer

I have JSON data that I am ingesting. I would like to route the event to an index based on one of the JSON fields. I've seen examples that use REGEX, but I want to avoid hard coding the indexes since I will need to update multiple config files if I start getting new types of data.

My JSON data includes the following section:

...
"collection": {
  "date": "...",
  "source": <Canada | US | Mexico>
},
...

I would like to have 3 seperate indexes, one for Canada, US, and Mexico. I would like to have the index determine dynamically based on the input.

I've seen examples that suggest this is easy to do with REGEX, and I think I could do this as follows that way:

indexes.conf:

[index-Canada]
...
[index-US]
...
[index-Mexico]
...

props.conf:

[default]
TRUNCATE = 0
INDEX_EXTRACTIONS = json
TIMESTAMP_FIELDS = collection.date
TRANSFORMS-SetIndex = setIndex-Canada, setIndex-US, setIndex-Mexico

transforms.conf:

[setIndex-Canada]
REGEX = "source": "Canada"
DEST_KEY = _MetaData::Index
FORMAT = index-Canada

[setIndex-US]
REGEX = "source": "US"
DEST_KEY = _MetaData::Index
FORMAT = index-US

[setIndex-Mexico]
REGEX = "source": "Mexico"
DEST_KEY = _MetaData::Index
FORMAT = index-Mexico

I think this will work. However, I would like to make it so that I don't have to hard code the transforms.conf for each index. One way is to do the following:

props.conf:

[default]
TRUNCATE = 0
INDEX_EXTRACTIONS = json
TIMESTAMP_FIELDS = collection.date
TRANSFORMS-SetIndex = setIndex

transforms.conf:

[setIndex]
REGEX = "source": "(.*)"
DEST_KEY = _MetaData::Index
FORMAT = index-$1

I have a couple questions about this:

  1. If the data has an index I haven't configured, can I somehow setup a fallback so that events that don't match a configured index are not lost?
  2. Can I use the SOURCE_KEY somehow to use the value of the JSON field instead of REGEX? I would rather use the JSON parsing ability of Splunk than my REGEX skills to make sure I am getting the right field. If somehow my REGEX shows up in the contents of the event later, I could get data routed to the wrong index.
0 Karma

amitm05
Builder

For #1
I think you'd need to handle that with some logical set of rules. May be something like defining 2 stanzas in transforms for setting your indexes. One would assign the index only if the sources are US, Mexico OR Canada :

[setIndex_KnownLocations]
REGEX = "source": "Canada|US|Mexico"
DEST_KEY = _MetaData::Index
FORMAT = index-$1

And the second would assign your backup index for all events from other sources :
[setIndex_UnKnownLocations]
REGEX = "source": "(.*)"
DEST_KEY = _MetaData::Index
FORMAT = index-BackupIndex

trenin
Explorer

Thanks - I will try that. Any thoughts for how to use the Splunk JSON parsing in favour of REGEX?

0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...