My end goal is to extract the sourcetype and index with a regex from the monitor path at runtime based on a lookup from the directory structure.
For example in the case of apache
actual monitor path will look like:
/apps/apache/http/access/http-access.log
OR
/apps/nginx/http/access/http-error.log
input.conf
#apache or nginx
[monitor:///apps/.../.../.../*.log.*]
sourcetype = ( REGEX = ^source::(?:/[^/]+){1}/([^/]+)/ 😞 ( REGEX = ^source::(?:/[^/]+){2}/([^/]+)/ )
index = (REGEX = ^source::(?:/[^/]+){0}/([^/]+)/ )
blacklist = \.(zip|gz)$
Desired output:
Splunk sends all apache access logs from /app/apache/http/access/http-access.log with index=apache and sourcetype = http:access
and splunk also sends all nginx error logs from /apps/nginx/http/error/http-error.log with index=nginx and sourcetype=http:error
@woodcock 's answer is technically the correct way to do what you're asking for, regardless of whether it's on an HF or an indexer. If the UFs are sending to a HF first, then props.conf & transforms.conf settings have to be on the HF. If the UFs are sending to the indexers directly, you have put props.conf & transforms.conf settings on an indexer.
If you want to know if you can do this directly from inputs.conf with regex, the answer is NO.
I agree with @woodcock & @oscar84x , don't do it this way, just make stanzas in inputs.conf to correspond to your apache & nginx log paths.
I would use slightly different versions from @oscar84x :
[monitor:///apps/apache/*/*/http-access.log]
sourcetype = http:access
index = apache
[monitor:///apps/nginx/*/*/http-error.log]
sourcetype = http:error
index = nginx
Like this:
In inputs.conf:
#apache or nginx
[monitor:///apps/.../.../.../*.log.*]
sourcetype = apache_or_nginx_temp
index = apache_or_nginx_temp
blacklist = \.(zip|gz)$
In props.conf:
[apache_or_nginx_temp]
TRANSFORMS-overrides_from_path = index_from_path, sourcetype_from_path
In transforms.conf:
[index_from_path]
SOURCE_KEY=source
REGEX = (?:/[^/]+){1}/([^/]+)/
DEST_KEY =_MetaData:Index
FORMAT = $1
[sourcetype_from_path]
SOURCE_KEY=source
REGEX = (?:/[^/]+){2}/([^/]+)/([^/]+)/
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::$1:$2
You must deploy this to the first full instance(s) of Splunk that handles the events (usually either the HF-tier, if you use this, or your Indexer tier), restart all Splunk instances there, send in new events (old events will stay broken), then test using _index_earliest=-5m to be absolutely certain that you are only examining the newly indexed events.
I answered the question that you asked but I 100% agree with @oscar84x: DO NOT USE MY ANSWER, USE HIS!!!!!
Yes I found useful and accepted your solution in regards to using a HF. However this is a separate question in which I want to know can I just set the sourcetype and index = regex like below?
I don't see much documentation on how to do this but I guess I don't understand why this is unattainable?
input.conf
#apache or nginx
[monitor:///apps/.../.../.../*.log.*]
sourcetype = ( REGEX = ^source::(?:/[^/]+){1}/([^/]+)/ 😞 ( REGEX = ^source::(?:/[^/]+){2}/([^/]+)/ )
index = (REGEX = ^source::(?:/[^/]+){0}/([^/]+)/ )
blacklist = \.(zip|gz)$
Your solution here is still referencing a HF and that's not what I'm inquiring about.
NO; there is no capability in Splunk for this. See the dox:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf
I'm not sure if you're able to do that, but even so I think an easier and cleaner solution would be to have two separate stanzas. That way you can troubleshoot and manage each different input/index/sourcetype individually.
[monitor:///apps/.../.../.../http-access.log]
sourcetype = http:access
index = apache
[monitor:///apps/.../.../.../http-error.log]
sourcetype = http:error
index = nginx
"I'm not sure if you're able to do that"...
Yeah I don't see much documentation on how to do this but I guess I don't understand why this is unattainable?
Yes, in this case it makes sense to separate the stanza's.However I didn't in my example because I wanted to see if there was a way to not hard code it as you did here.
Yes, it is possible, but not advisable; see my answer.