I'm in a distributed/cluster scenario (SH, Indexers, ...) and would like to route events in different indexes based on Search Heads props.conf calculated fields (like mvindex(split(source, "/"), 😎) or automatic lookup fields ( ... OUTPUTNEW ...).
First, is this possible provided the calculated/lookup fields are done at search time while storing in the proper index happen before at indexing time ... ? Correct?
Shall I move the calculated/lookup field on the indexers props.conf? Is this possible? In the case it is possible, can I use these fields in a indexers transforms.conf to change the _Metadata:Index DEST_KEY?
Thanks in advance for your help on this question.
If you used the database name as the index name, and that information is present in the source field, then with the correct rex, the destination index name could be set at index time.
That's your architecturally simplest solution.
A less-easy solution...
Hmmm. It's not pretty, but it would obviously be possible to...
(1) load the data initially into a temporary index
(2) calculate the destination index
(3) use "collect" to send the data to the destination index
(4) delete the data from the temporary index
This solution could result in double-counting the data that was indexed.
In our particular situation, there will be one critical caveat with that approach ... As of today, we host around 1500 different DBs for our (50+) departments/units ... This will end-up into creating 1500 indexes just for the DBs technology ...
But we will also have the same issue for the other technologies we want to "Splunk" like weblogic, tomcat, apache, ...
That's why we are trying to minimize as much as we can the indexes count by creating them based on departments/units which are having a more "reasonable" count ...
For the "less easy one", based on "collect", indeed it might "explode" our license costs but also the (double) processing time ...
I realize what I'm looking to achieve seems not that trivial ...
If you want to move data to different indexes based on event, then you should do it at index time using _Metadata:Index field. You will be able to extract field during index time using transforms.conf and props.conf configurations. Following example shows how index can be changed based on REGEX.
TRANSFORMS-feye = feye
REGEX = fenotify
I know already that approach. I want to use it but my logic is quite "complex". Let me explain.
The source field contain a file path which contain itself an Oracle database (as one of the folder of this file path). Then I need to extract that database name from the source file path to be able to do "a lookup" to get the exact company department this database belong too.
Then based on that department, i want to route the event in a proper department specific index. All that because a department should only be able to access events for information systems that belongs to it (thanks to Splunk indexes and roles based security).
So my question is, how can I extract first the DB name then do a lookup to get the department at indexing time so I can route the event in the right index?
Thanks again for your help
As far as I understand things, you cannot do a lookup at index time:
Though that is an old Q&A I think it is still valid today.
What if you switch the problem around?
Rather than attempt to determine the index based on the events, why not generate the inputs.conf based on the who owns the database and hardcode the index= entries into the inputs.conf file?
That would be a much more manageable scenario than attempting to determine which index based on the events.
That could be indeed a smart approach. We could generate the inputs.conf on a regular basis (each time a new DB is created/removed/...) from our CMDB and have these inputs.conf files deployed (when changed) on our future deployment server for final automatic deployment on forwarders.
I have just one practical question. Will replacing such generic monitor in inputs.conf
sourcetype = oracle:audit:text
By several thousands (3000-5000?) of such (with DBNameX hardcoded instead of being a wildcard) might have a significant performance impact on forwarders (at start time, at monitor time, ...) ?
sourcetype = oracle:audit:text
sourcetype = oracle:audit:text
In my previous testing I've had servers where there were approx 100 files monitored without wildcards and less than 3% CPU usage.
Others with wildcards on 5 directories could be as high as 20% CPU usage.
However, the newer Splunk versions (newer than 6.4.x) appear to have less of an issue with wildcards.
That said, we only have a few servers that have thousands of files (<3000 I believe) and we don't have any issues, however our environments could be different, so I suspect this is something you will have to try and see.
I would be interested in the results!
While continuing my research on solving this I've seen that it looks like you can do "external lookup" at indexing time --> http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Configureexternallookups
Another thing i might consider (as last resort?) is "Scripted Input" --> http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ScriptedInputsIntro
but I hope I will not have to reach that level of complexity for a "simple" (event routing) thing I would like to achieve ...
I don't see any indication that external lookups are available at index time.
Basically, the "scripted input" option says, you write an external script or app to alter the data before indexing. In this case, one simple method would be to set up that script to copy the input data from the original monitored file folder to a monitored input folder that flagged the name of the department or destination index.
Indeed, there is no clear indication that "external lookup" can be used at indexing time. It would be nice to get this point confirmed/denied as well, I agree.
For the "scripted input" approach, one idea would be to parse and split the source to get the database name and then do an "api" call (web service, db call, ...) to a repository (in our case it is an ITIL CMDB system) to obtain the corresponding department name. What i'm already afraid with this solution is the performance hit of generating one API call for each event processed at indexing time ... I will probably need to implement a local cache system to avoid that performance hit ... which can be not that trivial in a clustered indexers setup ...