Hello
I have one big index with lots of files which I want to reroute logs from there to different indexes
The reroute will be by regex who is looking for the domain name in the logs
For each domain i will create separate stanza in transforms.conf
for example :
[setIdx-index1]
REGEX = ^(?!.*{ "workflow_id": .*, "workflow_type": .*, "workflow_name": .*, "jira_ticket": .*, "actor": .*, "deployment_status": .*, "start_time": .*, "end_time": .*, ("app_name"|"additional_data"): .* }).*$
FORMAT = new_index
DEST_KEY = _MetaData:Index
LOOKAHEAD = 40000
my question is about props.conf
how should i configure it if i have more than 1 index ?
[index1]
TRANSFORMS-setIdx = setIdx-index1
TRANSFORMS-setIdx2 = newIndex
TRANSFORMS-setIdx3 = newIndex1
TRANSFORMS-setIdx4 = newIndex2
should it work ?
Hi @sarit_s6 ,
as also @PickleRick and @marnall said, the only resons to have different indexes are different retentions and grant accesses, even if you have a big index: dimension isn't an issue for the indexes.
Remember that Splunk isn't a database and that indexes aren't tables!
Event if also following your bad idea (bad because you need to create and manage many indexes without any apparent reason), it's possible to dinamically assign the index name extracting the index name from the logs.
In addition your regex it's very heavy for your system (you have many groups .* in your regex and one of them at the begininning of the regex) and you're giving a completely unuseful overload to your system.
You can check the performaces of your regex in regex101.com.
Ciao.
Giuseppe
the regex is just an example, its not the real one since the regex is not the issue here
the purpose of this step is because we need to separate the logs per domain
so my question is if the props.conf example is the right way or maybe there is different way to do it ?
Out of curiosity - why do you want to split those events into separate indexes? Different retention periods? Access differences?
each index is for different domain
we want to split the logs per domain
Hi @sarit_s6 ,
if you want an index for each domain, you can choose the index name from the domain contained in the log, but, as I said, it isn't a good idea, also because you have to create indexes before re-routing and this action cannot be automatic!
In addition, in this way, you'll have thousands of indexes, I'm repeting: it isn't a good idea"
Ciao.
Giuseppe
i will try to explain it from start
i have one index that contains lots of data for many domains
we need to split this index so the logs for each domain will be indexes to the relevant index (which is already exist)
the problem we have with keeping this large index is that we are saving the data for long retention and not all of the domains needs this data for the same time
Hi @sarit_s6 ,
if you have different retention values for your events, you must use different indexes.
The name of indexes are in the events or not?
could you share some sample of your logs?
Ciao.
Giuseppe
in the event i have the name of the domain, that is the only key i can use
all of the logs are in one big index and i need to split it
Hi @sarit_s6 ,
please share a sample of your logs so I can show you how to set the indexname.
Ciao.
Giuseppe
{"Time":"2024-07-29T08:18:22.6471555Z","Level":"Info","Message":"Targeted Delivery","Domain":"NA","ClientDateTime":"2024-07-29T08:18:21.703Z","SecondsFromStartUp":2,"UserAgent":"Mozilla/5.0 (Linux; Android 9; Redmi Note 8 Pro Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/127.0.6533.64 Mobile Safari/537.36 ,"Metadata":{"Environment":"Production"}}
OK. Different retention periods is a valid reason for distributing data between different indexes.
The caveat with splitting data this way is that while configuration like
[mysourcetype]
TRANSFORMS-redirect=redirect_to_index1,redirect_to_index2,redirect_to_index3...
is valid, you have to remember that all transforms will be called for each event. So Splunk will try to match each of the regexes contained withih every transform to each event. The more indexes you want to split to, the more work the indexer (or HF, depending on where you put this config) will have to do.
Additional question - where are you getting the data from? Maybe it would be better to split the event stream before it's hitting Splunk.
Yes but keep in mind that this will not affect events that are currently in the one big index. New incoming events will be routed to other indexes if they match the corresponding transform regex.
Every transform in props.conf will be tried against the logs that match the stanza. This means that if a regex in a transform matches the event, then the index value of the event will be overwritten. If multiple regexes in the transforms match an event, then that event will be overwritten multiple times and will retain the value of the last transform whose regex matched.
Therefore you should make the regexes strict so that logs that should go to newIndex do not accidentally go into newIndex1.