Splunk Search

How to extract sourcetype and index with a regex from the monitor path directory structure?

Explorer

My end goal is to extract the sourcetype and index with a regex from the monitor path at runtime based on a lookup from the directory structure.

For example in the case of apache
actual monitor path will look like:

/apps/apache/http/access/http-access.log

OR

/apps/nginx/http/access/http-error.log

input.conf

 #apache or nginx
  [monitor:///apps/.../.../.../*.log.*]
  sourcetype = ( REGEX = ^source::(?:/[^/]+){1}/([^/]+)/ 😞 ( REGEX = ^source::(?:/[^/]+){2}/([^/]+)/ )
  index =  (REGEX = ^source::(?:/[^/]+){0}/([^/]+)/ )
  blacklist = \.(zip|gz)$

Desired output:

Splunk sends all apache access logs from /app/apache/http/access/http-access.log with index=apache and sourcetype = http:access
and splunk also sends all nginx error logs from /apps/nginx/http/error/http-error.log with index=nginx and sourcetype=http:error

0 Karma

Builder

@woodcock 's answer is technically the correct way to do what you're asking for, regardless of whether it's on an HF or an indexer. If the UFs are sending to a HF first, then props.conf & transforms.conf settings have to be on the HF. If the UFs are sending to the indexers directly, you have put props.conf & transforms.conf settings on an indexer.

If you want to know if you can do this directly from inputs.conf with regex, the answer is NO.

I agree with @woodcock & @oscar84x , don't do it this way, just make stanzas in inputs.conf to correspond to your apache & nginx log paths.

I would use slightly different versions from @oscar84x :

[monitor:///apps/apache/*/*/http-access.log]
sourcetype = http:access
index = apache

[monitor:///apps/nginx/*/*/http-error.log]
sourcetype = http:error
index = nginx

Esteemed Legend

Like this:

In inputs.conf:

#apache or nginx
[monitor:///apps/.../.../.../*.log.*]
sourcetype = apache_or_nginx_temp
index = apache_or_nginx_temp
blacklist = \.(zip|gz)$

In props.conf:

[apache_or_nginx_temp]
TRANSFORMS-overrides_from_path = index_from_path, sourcetype_from_path

In transforms.conf:

[index_from_path]
SOURCE_KEY=source
REGEX = (?:/[^/]+){1}/([^/]+)/ 
DEST_KEY =_MetaData:Index
FORMAT = $1

[sourcetype_from_path]
SOURCE_KEY=source
REGEX = (?:/[^/]+){2}/([^/]+)/([^/]+)/
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::$1:$2

You must deploy this to the first full instance(s) of Splunk that handles the events (usually either the HF-tier, if you use this, or your Indexer tier), restart all Splunk instances there, send in new events (old events will stay broken), then test using indexearliest=-5m to be absolutely certain that you are only examining the newly indexed events.

0 Karma

Esteemed Legend

I answered the question that you asked but I 100% agree with @oscar84x: DO NOT USE MY ANSWER, USE HIS!!!!!

Explorer

Yes I found useful and accepted your solution in regards to using a HF. However this is a separate question in which I want to know can I just set the sourcetype and index = regex like below?
I don't see much documentation on how to do this but I guess I don't understand why this is unattainable?

input.conf

  #apache or nginx
   [monitor:///apps/.../.../.../*.log.*]
   sourcetype = ( REGEX = ^source::(?:/[^/]+){1}/([^/]+)/ 😞 ( REGEX = ^source::(?:/[^/]+){2}/([^/]+)/ )
   index =  (REGEX = ^source::(?:/[^/]+){0}/([^/]+)/ )
   blacklist = \.(zip|gz)$

Your solution here is still referencing a HF and that's not what I'm inquiring about.

0 Karma

Esteemed Legend

NO; there is no capability in Splunk for this. See the dox:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf

0 Karma

Contributor

I'm not sure if you're able to do that, but even so I think an easier and cleaner solution would be to have two separate stanzas. That way you can troubleshoot and manage each different input/index/sourcetype individually.

[monitor:///apps/.../.../.../http-access.log]
sourcetype = http:access
index = apache

[monitor:///apps/.../.../.../http-error.log]
sourcetype = http:error
index = nginx

Explorer

"I'm not sure if you're able to do that"...

Yeah I don't see much documentation on how to do this but I guess I don't understand why this is unattainable?

0 Karma

Explorer

Yes, in this case it makes sense to separate the stanza's.However I didn't in my example because I wanted to see if there was a way to not hard code it as you did here.

0 Karma

Esteemed Legend

Yes, it is possible, but not advisable; see my answer.

0 Karma