Hi,
I need to figure out what fields our Splunk users are searching for, either in their reports or dashboards. Is it do-able? If so how? Please help.
Someone will (hopefully) have a better answer than mine as I'm not sure this is particularly easy. If you go with the idea that users will specify the fields they are interested prior to the first pipe you could, in theory, create a search that parses through the search logs (_audit index) to look for field=value formats. Of course that wouldn't account for people who search for something and then might use a table command to look at fields of interest. I created the search below but YMMV on how useful it is. Beware that I'm not a regex hero. This, in theory, accounts for most cases where you have field=value but it doesn't work for field = value. I've also left in a lot of table commands so you could strip out the search after each to see what that stage of the query is doing. The other thing I tried to account for is someone who might run a search that has multiple instances of a particular field which is why I threw in the dedup command when and where I did.
index=_audit "action=search" "info=granted" search_id=* | rex field=search "^'(?<foo>.+?\|)" | rex field=foo max_match=0 "(?:\s|\()(?<field>\w.+?)(?:!=|=)" | table user search foo field | rex mode=sed field=field "s/(OR |NOT |\()//g" | mvexpand field | table user search field | dedup user search field | stats count as fieldInstanceByUser by user field | eventstats sum(fieldInstanceByUser) as totalFieldInstances by field
Perhaps ironically in my search I'm not using field=value for action and info as I'm used to using quotes to search those as string values for performance. You may or may not want to include that logic in your search.
Hint: doing that while looking for Windows events will help your search performance a good deal depending on the volume of those logs in your environment. In other words do "EventCode=4624" vs EventCode=4624.
If you are training your users to include the index name(s) in their searches you could look for the presence of that in the search string ( as in index=foo). That wont capture instances where users are simply put in terms they are searching for without specifying index. If you wanted to look for searches against specific dashboards you could maybe do something fancy with the splunkd_ui_access logs in the internal index - as in get the search IDs as part of a subsearch you then pass up to the parent search.
I'm not really linking up looking for what fields people are searching on when it comes to them interfacing with a dashboard or report as they tend to be fairly baked in terms of how people are able to interact with them. That isn't always the case I suppose.
Runals I can't thank you enough for the explaination. It makes perfect sense to me now.
One more question. Is it possible to limit this search to only look for all the searches run against a particular index?
In particular just from the saved reports and dashboards?
Runals thanks for taking your time to reply in detail and put together the search which I need. Much appreciated!
I'm still learning splunk so don't understand few things in your search so would you please be able to explain each of these briefly?
| rex field=search "^'(?.+?\|)"
| rex field=foo max_match=0 "(?:\s|\()(?\w.+?)(?:!=|=)"
| rex mode=sed field=field "s/(OR |NOT |\()//g"
| mvexpand field
Thanks
The rex command allows you to create new fields on the fly/within the context of your search. In the first pipe I'm telling Splunk I want to create a field called foo from within the existing field 'search'. The rex starts at the beginning of the field and goes to the first pipe. This is to isolate that portion of the user's search from the rest of their query. Instead of calling that field 'foo' I could just have easily named it something else - to include 'search' - but wanted to highlight that I'm creating a new field to eventually place it side by side with the table command (to eyeball the results to make sure things are on track). The next rex command looks for every instance of some text that is immediately followed by an equal sign. The every instance piece is accomplished by the max_match=0 parameter. Without max_match= the command will just pull out the first field. If you set the number to 0 it will match as many times as it sees. I could have gone with something like 10 or 20 as it is unlikely people will use that many fields but it is just as easy to use 0. The process of using max_match will create a multivalue field assuming the user IS using more than one field. This can be seen if you strip out everything after that first table command. My regex in the second rex command also doesn't account for the user using boolean operators. For example if the user has done the following initial search 'index=foo sourcetype=bar NOT src_ip=10.10.10.0/24 (action=failure OR src_host=purplemonkey)' what shows up in the 'field' field is
index
sourcetype
NOT src_ip
(action
OR src_host
The sed mode for rex is basically a search and replace function. I'm searching (s) for a variety of things (between the first 2 forward slashes) and replacing that with nothing (the second and third forward slash - if I put something there it would be placed where the other strings were) and this is designed to replace every instance vs just the first (g). If you have more questions on SED I'd Google it. Now that the extraneous bits are cleaned up I want to expand this multivalue field into one event for each field in the search. This is what the mvexpand command is for. You can see the results of everything to this point by stripping out the part of the search after the next table command. What you might see is multiple instances of the same field from a user's search - for example multiple src_ip. This has the potentially to artificially inflate the results of how often a field is being used if you did a stats count at this point so I threw in the dedup command and followed it with the stats. If you haven't run into the eventstats command think of it like a cross between an eval and stats as the results are added to each line.
Hope that helps.
Agree with Runals. There is (often) no standard in which user may use the fields. Also, the fields, as a filter, can be used at any portion of the search, so there is a good chance that you won't be able to cover all scenarios and/or have false matches as well.