Getting Data In

re-route logs from one index to another

sarit_s6
Engager

Hello

I have one big index with lots of files which I want to reroute logs from there to different indexes
The reroute will be by regex who is looking for the domain name in the logs
For each domain i will create separate stanza in transforms.conf 
for example :

[setIdx-index1]
REGEX = ^(?!.*{ "workflow_id": .*, "workflow_type": .*, "workflow_name": .*, "jira_ticket": .*, "actor": .*, "deployment_status": .*, "start_time": .*, "end_time": .*, ("app_name"|"additional_data"): .* }).*$
FORMAT = new_index
DEST_KEY = _MetaData:Index
LOOKAHEAD = 40000

my question is about props.conf

how should i configure it if i have more than 1 index ?

[index1]
TRANSFORMS-setIdx = setIdx-index1
TRANSFORMS-setIdx2 = newIndex
TRANSFORMS-setIdx3 = newIndex1
TRANSFORMS-setIdx4 = newIndex2

should it work ?

Labels (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sarit_s6 ,

as also @PickleRick and @marnall said, the only resons to have different indexes are different retentions and grant accesses, even if you have a big index: dimension isn't an issue for the indexes.

Remember that Splunk isn't a database and that indexes aren't tables!

Event if also following your bad idea (bad because you need to create and manage many indexes without any apparent reason), it's possible to dinamically assign the index name extracting the index name from the logs.

In addition your regex it's very heavy for your system (you have many groups .* in your regex and one of them at the begininning of the regex) and you're giving a completely unuseful overload to your system.

You can check the performaces of your regex in regex101.com.

Ciao.

Giuseppe

0 Karma

sarit_s6
Engager

the regex is just an example, its not the real one since the regex is not the issue here

the purpose of this step is because we need to separate the logs per domain

so my question is if the props.conf example is the right way or maybe there is different way to do it ?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Out of curiosity - why do you want to split those events into separate indexes? Different retention periods? Access differences?

0 Karma

sarit_s6
Engager

each index is for different domain 
we want to split the logs per domain

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sarit_s6 ,

if you want an index for each domain, you can choose the index name from the domain contained in the log, but, as I said, it isn't a good idea, also because you have to create indexes before re-routing and this action cannot be automatic!

In addition, in this way, you'll have thousands of indexes, I'm repeting: it isn't a good idea"

Ciao.

Giuseppe

0 Karma

sarit_s6
Engager

i will try to explain it from start
i have one index that contains lots of data for many domains
we need to split this index so the logs for each domain will be indexes to the relevant index (which is already exist)
the problem we have with keeping this large index is that we are saving the data for long retention and not all of the domains needs this data for the same time

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sarit_s6 ,

if you have different retention values for your events, you must use different indexes.

The name of indexes are in the events or not?

could you share some sample of your logs?

Ciao.

Giuseppe

0 Karma

sarit_s6
Engager

in the event i have the name of the domain, that is the only key i can use
all of the logs are in one big index and i need to split it 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sarit_s6 ,

please share a sample of your logs so I can show you how to set the indexname.

Ciao.

Giuseppe

0 Karma

sarit_s6
Engager
{"Time":"2024-07-29T08:18:22.6471555Z","Level":"Info","Message":"Targeted Delivery","Domain":"NA","ClientDateTime":"2024-07-29T08:18:21.703Z","SecondsFromStartUp":2,"UserAgent":"Mozilla/5.0 (Linux; Android 9; Redmi Note 8 Pro Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/127.0.6533.64 Mobile Safari/537.36  ,"Metadata":{"Environment":"Production"}}
0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. Different retention periods is a valid reason for distributing data between different indexes.

The caveat with splitting data this way is that while configuration like

[mysourcetype]
TRANSFORMS-redirect=redirect_to_index1,redirect_to_index2,redirect_to_index3...

is valid, you have to remember that all transforms will be called for each event. So Splunk will try to match each of the regexes contained withih every transform to each event. The more indexes you want to split to, the more work the indexer (or HF, depending on where you put this config) will have to do.

Additional question - where are you getting the data from? Maybe it would be better to split the event stream before it's hitting Splunk.

0 Karma

marnall
Motivator

Yes but keep in mind that this will not affect events that are currently in the one big index. New incoming events will be routed to other indexes if they match the corresponding transform regex.

Every transform in props.conf will be tried against the logs that match the stanza. This means that if a regex in a transform matches the event, then the index value of the event will be overwritten. If multiple regexes in the transforms match an event, then that event will be overwritten multiple times and will retain the value of the last transform whose regex matched.

Therefore you should make the regexes strict so that logs that should go to newIndex do not accidentally go into newIndex1.

0 Karma
Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...