Hi - fairly new to Splunk an have a specific report my customer wants to monitor/report on. They want to understand how many people are using Splunk over time. This will also us to size Splunk web deployment in the future. I have the S.o.S app installed but none of the dashboards are quite right from S.o.S or the default Splunk activity dashboards to give me user concurrency.
I am trying to construct something that shows number of concurrent users that have/are logged into splunk web. i,e.
You've hit on a touchy problem with Splunk: figuring out how busy the infrastructure is at any point in time. There are two things to look at.
1) How many users are currently using Splunk.
This is interesting, but only goes so far. Am I "currently using Splunk" if I have a static dashboard on my screen that has finished loading 10 minutes ago, and I'm either staring at it, or have my head turned talking to someone else? martin_mueller's search in his comment is spot on, and will help you answer this question. 1 hour may be too long of a time frame, as I have found 1m or 5m is more useful for determining how busy Splunk is.
2) How many searches are currently being run.
This is a little harder, because searches come and go, sometimes fairly quickly. A couple of ways to see this info. First, concurrent searches by user. Who's exercising Splunk the most?
index=_internal source=*metrics.log group="search_concurrency" NOT "system total" | timechart span=1m sum(active_hist_searches) as concurrent_searches by user
Interesting patterns emerge per person/group and time of day.
Second, is this ad-hoc or scheduled? Too many concurrent scheduled searches can really bring Splunk to its knees. A lot of scheduled searches may be okay, if they are very short duration (like populating summary indexes or report acceleration).
`set_sos_index` sourcetype=ps | multikv | `get_splunk_process_type` | search type="searches" | rex field=ARGS "_--user=(?<search_user>.*?)_--" | rex field=ARGS "--id=(?<sid>.*?)_--" | rex field=sid "remote_(?<search_head>[^_]*?)_" | eval is_remote=if(like(sid,"%remote%"),"remote","local") | eval is_scheduled=if(like(sid,"%scheduler_%"),"scheduled","ad-hoc") | eval is_realtime=if(like(sid,"%rt_%"),"real-time","historical") | eval is_subsearch=if(like(sid,"%subsearch_%"),"subsearch","generic") | eval search_type=is_remote.", ".is_scheduled.", ".is_realtime | timechart span=1m dc(sid) AS "Search count" by is_scheduled
Props go out to hexx (SoS guru) for these, and hopefully they (or something like it) will show up in SoS in the near future.
As a different approach, you could run this:
| pivot internal_audit_logs searches sum(total_run_time) AS run_time SPLITROW _time PERIOD hour SORT 0 _time | eval avg_cpus = run_time / 3600 | timechart span=1d max(avg_cpus) as max_cpus_per_hour
That will calculate the total seconds spent on searching for every hour, convert that into average number of searches running concurrently during that hour, and use the worst hour each day for charting.
BTW david is your large sos search supposed to work as written? my instance doesn't seem to like the
get_splunk_process_type bits? PS I've not worked with searches of this complexity yet so excuse my ignorance
You've hit the nail on the head. However unfortunately I don't actually care about busy, I have enough monitoring elsewhere to figure out busy and cause thereof, just concurrency at this point. sigh
It's an interesting conundrum because one of the sizing factors Splunk recommend is 1 user per core (ideally 2) hence concurrency would seem to be a useful measure in sizing...
thanks for pointers so far everyone
PS - As MuS rightly surmised. I am using LDAP, forgot to mention that bit!
it depends how Splunk handles user authentication.
If you're using LDAP based users and SSO for authentication, user logins are not handled by Splunk and therefore you will not find any of the SSO / LDAP user logins in the audit.log.
But you can use the REST end point /services/authenticaion/httpauth-tokens on your search head like this
| rest /services/authentication/httpauth-tokens splunk_server=local
and you will get a list of users which were or still are connect over SSO / LDAP.
Setting this up as saved search with summary indexing will give you the abillity to gether historical events as well.
If you're using Splunk internal user authentication, you will find the needed information inside Splunk's
audit.log. You can search for it like this:
index=_audit action="login attempt" | ...
hope this helps...