We have a fairly large Splunk environment with several 1000 hosts reporting in. Within our business we have requirements around access, such that some folks can see just their hosts, some can see all hosts in a region and some others can see everything.
To date, we have used host tagging to provision access controls. For example, a new host starts reporting into Splunk and we give it a host tag in the format of something like biz-subbiz-subsubbiz. This then allows me to provision someone with a role that can see all hosts tagged with biz* and someone else with a more restrictive role of biz-subbiz-subsubiz.
For the most part, this strategy has worked ok. However, maintaining the tags in the tags.conf file is a pain. Additionally, we will soon be expecting a lot more hosts (8000+) which means each one has to be tagged appropriately and I am worried/conscious that our tagging strategy may not scale.
I came up with a new approach using a lookup file that listed all the hosts and then a role tag. We implemented this as an automated lookup that would bring in the host role tag and then we created a role to match the role tag that is brought in by the lookup. This also works, but searching is much slower than using the host tagging approach. This is because the search seems to be searching through all the log events for the particular host role. Whereas with the host tagging approach it was only searching the data from the hosts with the correct host tag.
So, do we continue with the host tagging approach (i.e. will it scale?), or is there some other method. Putting the data into different indexes is not really a viable option as even within the index we would need granular controls, so i am trying to live with what we have. I should also note that even though we have 8000 hosts we only have about 100 or so tags and corresponding roles.
I'm not sure you've implemented your lookups correctly, as it should run exactly as fast as host tagging, assuming your lookup is basically just a map between
host and the "group" name, and your filters simply changed from a lookup on the tag value to a reverse lookup on the group name.
This is what the auto lookup looks like:
lookuphosts host AS host OUTPUTNEW hostdesc AS hostdesc hostrole AS host_role
It is applied to where host=*
We have then mapped the role filter to hostrole=nameof_role
okay. Depending on the values of hostrole that you chose, and whether they are common in your raw data, you may want to put into fields.conf a stanza for `[hostrole] INDEXED_VALUE=false`
Thanks - will try that and let you know.
One thing I did notice is that when searching using the host tagging approach, the number of scanned events was the same as the number of matching events. However when searching using the host lookup approach, the number of scanned events was much larger than the number of matching events. It is as if the host tagging approach only searches the host with the matching tag, whereas the lookup approach has to search through everything, then checks to see if there is a matching host role and only displays that matching data.
ugh, my mistake, that won't help. don't do it. Can you tell me what the values of your
host_role field are, and whether they exist in your data? Reverse lookup should do exactly that same as tags, i.e., covert the clause to an
OR clause containing the matching host fields. I suppose I would experiment with removing
host_desc from the lookup, though that shouldn't make a difference. I suspect Splunk is, in addition to looking for matching "host" fields, is also looking for the value of
host_role in the event, just in case.
So we removed the extra fields and the lookup now simply contains host and hostrole. This did not make any difference. To answer your question about whether the hostrole exists in the data, it does not.
So for example,
host_role might be 'abc'. So we have a role set up on the system called abc and under "Restrict search terms (optional)" we have put:
host_role=abc. We then have an auto lookup in place that brings back the
host_role for every
host and i guess in this case the role matches against where
A little more research later -
We tried the following and timed each one to 100,000 events and it yields interesting results:
This is the original host tagging approach and took 1 minute to run
This was the new approach using an autolookup to bring in the data and took 5 minutes to run (i.e. 5x slower), as it seems to search every event
sourcetype=X [| inputlookup lookup_hosts | search host_role=abc | fields host ]
This implements a sub-search that matches the hosts and took 1 minute to run
Hope this helps
A support case was logged and a review of the configuration was required.
Resolution was to modify
max_reverse_matches attribute in $SPLUNK_HOME/etc/system/local/limits.conf from the default of 50 to:
[lookup] max_reverse_matches = 7500
max_reverse_matches was increased, the lookup method worked similarly in performance with the same scanned events and timing as their previous host tagging.
Note: the use of
INDEXED_VALUE=false in $SPLUNKHOME/etc/system/local/fields.conf would not have been appropriate in this case as gkanapathy highlighted. This entry turns any search with `hostrole` into a * search which would have contributed to poor performance.