Getting Data In

separating many similar files into different indexes

Builder

I have a tree of files that looks something like the following:

/var/log/able/access.log
/var/log/baker/access.log
/var/log/charlie/access.log
/var/log/delta/access.log
...  (many many more)

I had previously been monitoring that with the following in inputs.conf to put all the monitored files into the 'main' index:

[monitor:///var/log/]
crcSalt = <SOURCE>
whitelist = (access|error)\.log$

Now I have a situation where I'd like to separate some of those directories and put their contents to an index that isn't 'main'. However, that's not really working right for me as that seems to be an overlapping inputs situation.

I could certainly list out each directory, but that's many directories and that's a big pain to do (and even more so to keep up to date).

At the moment I'm playing with the following inputs.conf:

[monitor:///var/log/charlie/]
crcSalt = <SOURCE>
whitelist = (access|error)\.log$
index = charlie

[monitor:///var/log/]
crcSalt = <SOURCE>
blacklist = charlie
whitelist = (access|error)\.log$
index = main

but this seems to ignore the "charlie" definition. I also tried the blacklist/whitelist combo listed above in hopes that it would help, but it also seems to have no effect.

Is my only approach for paring off certain inputs to separate indexes to itemize each and every one in inputs.conf? There's got to be a better way...

I'm using Splunk 4.1.4.

Thanks!

Tags (2)
0 Karma
3 Solutions

Splunk Employee
Splunk Employee

You can use a TRANSFORM to route to an index, similar to routing to different queues, based on SOURCE_KEY source. However, are you sure you really need separate indexes?

View solution in original post

Splunk Employee
Splunk Employee

Your configuration should work fine. Have you cleaned your index and restarted to reread the files?

View solution in original post

Builder

Thanks, Stephen.

After further scrutiny and more testing, I found that this configuration does work:

# write the 'charlie' subdirectory contents to its own index
[monitor:///var/log/charlie/]
crcSalt = <SOURCE>
whitelist = (access|error)\.log$
recursive = false
index = charlie

# write the 'delta' subdirectory contents to its own index
[monitor:///var/log/delta/]
crcSalt = <SOURCE>
whitelist = (access|error)\.log$
recursive = false
index = delta

# everything else under this tree defaults to the 'main' index
[monitor:///var/log/]
crcSalt = <SOURCE>
whitelist = (access|error)\.log$
index = main

It wasn't clear to me if the order of these entries mattered, but this does indeed seem to work fine.

I will be making this change to an existing configuration where everything goes to the 'main' index already. I'm hoping that I won't need to clean out any old configuration data.

Thanks.

View solution in original post

0 Karma

Builder

Thanks, Stephen.

After further scrutiny and more testing, I found that this configuration does work:

# write the 'charlie' subdirectory contents to its own index
[monitor:///var/log/charlie/]
crcSalt = <SOURCE>
whitelist = (access|error)\.log$
recursive = false
index = charlie

# write the 'delta' subdirectory contents to its own index
[monitor:///var/log/delta/]
crcSalt = <SOURCE>
whitelist = (access|error)\.log$
recursive = false
index = delta

# everything else under this tree defaults to the 'main' index
[monitor:///var/log/]
crcSalt = <SOURCE>
whitelist = (access|error)\.log$
index = main

It wasn't clear to me if the order of these entries mattered, but this does indeed seem to work fine.

I will be making this change to an existing configuration where everything goes to the 'main' index already. I'm hoping that I won't need to clean out any old configuration data.

Thanks.

View solution in original post

0 Karma

Builder

Right. I'm good with that. I realize the data will exist in 2 places until it eventually gets expired from 'main'. Thanks!

0 Karma

Splunk Employee
Splunk Employee

The order of the entries will make no difference here. Also, the data already indexed in 'main' will remain there, unless you clean the index. Changing the configuration will NOT make the already-indexed data appear in 'charlie' or 'delta'.

0 Karma

Splunk Employee
Splunk Employee

Your configuration should work fine. Have you cleaned your index and restarted to reread the files?

View solution in original post

Splunk Employee
Splunk Employee

I'd suggest running: "splunk cmd btool inputs list-debug monitor 'monitor:///var/log'". I'm curious what btool has to say about the stanzas, and make sure that they look right.

0 Karma

Builder

I'm doing this on a test instance to see if I can make it work. I'd already cleaned and restarted. When I do so, I find that the "charlie" subdirectory data input shows as being disabled in Splunk Manager. When I do 'splunk list monitor' it shows '/var/log/' as a monitored directory and '/var/log/charlie/' as a monitored file (but Splunk Manager still indicates that it's disabled).

If I enable the '/var/log/charlie' data input from Manager and then restart, it stays enabled, but nothing gets indexed ('splunk list monitor' output stays unchanged as well). Nothing's being indexed now.

0 Karma

Splunk Employee
Splunk Employee

You can use a TRANSFORM to route to an index, similar to routing to different queues, based on SOURCE_KEY source. However, are you sure you really need separate indexes?

View solution in original post

Builder

I'll investigate the TRANSFORM method. I was unaware of that.

Yes. In my example, 99 of those directories are to be monitored, but have a lower priority. That is, we don't care all that much if we have to expire them from the indexes quickly (if we have to). But one directory has a much higher importance and its contents, in combination with other related files on other servers are likely to be searched a lot and are for a much more important application.

0 Karma