Getting Data In

How to find out who is using the most ressources in a clustered environment per user and per search type ?

sgessati
Explorer

Dear all,

We're starting to suffer from our user where our clustered architecture is collapsing under the heavy load of searches that prevent our indexers to index. O_o

Using Splunk Enterprise 7.0.5 we don't have access to Workload Management feature starting in Splunk 7.2.x and even so, our Linux is not yet a 7.x with systemd so we won't be able to enforce it after upgrading Splunk yet.

Having more than 2000 users defined in almost 300 search head apps that let them define their dashboards, reports, alerts, schedule and accelerations, it's increasingly vital to pin point bad searchers.

Although the Monitoring Console provide some great dashboards, it's a heck to check resource usage per indexer and per search head to get the Big Picture.

In the answer, I'll share a Dashboard made by Splunk Professional Services I wish I had by default within the Monitoring Console.
It's sum the cumulative search time of each user by search type (Ad hoc or Scheduled) across all Search Heads and give you some context like full user name and roles. It even give you details of all searched run per selected user in a bottom panel !

If it helps me out, I'm sure it will help you out as well, feel free to share. Or maybe you have something even better to share ! \o/

0 Karma

sgessati
Explorer

Ok, Here's the dashboard you can copy and past within a brand new one from your Monitoring Console since it's using some rest commands.

Two kind of things to modify suiting your environment within <#SOMETHING#> in the code removing <##> and changing SOMETHING by what you have. 😛
The first two one is one of our Search Head Cluster member to retrieve user's context (Real Name and Roles)
Then we provide all our Splunk Search Heads Members using "ITGSPLKPRDSH*" hence having a naming convention (they are called itgsplkprdsh01, itgsplkprdsh02, itgsplkprdsh03, etc...)

Enjoy and may the Splunk be with you, always ! ^_^

<form>
  <label>SPLK top users per runtime (clic on any line to get details of all search for that user within the bottom panel)</label>
  <fieldset submitButton="true" autoRun="true">
    <input type="time" token="temps" searchWhenChanged="false">
      <label>periode</label>
      <default>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="multiselect" token="usersfilterglobal">
      <label>utilisateurs (par id)</label>
      <choice value="*">All</choice>
      <default>*</default>
      <prefix>(</prefix>
      <suffix>)</suffix>
      <initialValue>*</initialValue>
      <valuePrefix>user=</valuePrefix>
      <delimiter> OR </delimiter>
      <fieldForLabel>title</fieldForLabel>
      <fieldForValue>title</fieldForValue>
      <search>
        <query>|rest splunk_server=<#ONE OF YOUR SEARCH HEAD TO GET USER NAME AND ROLES#> /services/authentication/users | fields title | dedup title</query>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </search>
    </input>
    <input type="multiselect" token="utilisateursfilterglobal">
      <label>utilisateurs (par nom)</label>
      <choice value="*">All</choice>
      <default>*</default>
      <prefix>(</prefix>
      <suffix>)</suffix>
      <initialValue>*</initialValue>
      <valuePrefix>user=</valuePrefix>
      <delimiter> OR </delimiter>
      <fieldForLabel>realname</fieldForLabel>
      <fieldForValue>user</fieldForValue>
      <search>
        <query>|rest splunk_server=<#ONE OF YOUR SEARCH HEAD TO GET USER NAME AND ROLES#> /services/authentication/users | fields title , realname | rename title as user</query>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </search>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>Top cumulative runtime by users and search type</title>
      <table>
        <search>
          <query>`dmc_audit_get_searches(<#ITGSPLKPRDSH*#>)`| search $usersfilterglobal$ AND $utilisateursfilterglobal$|   stats min(_time) as _time, values(user) as user, max(total_run_time) as total_run_time, first(search) as search, first(search_type) as search_type, first(apiStartTime) as apiStartTime, first(apiEndTime) as apiEndTime by search_id
      | where isnotnull(search)  |  stats median(total_run_time) as median_runtime Perc90(total_run_time) as Perc90_runtime sum(total_run_time) as cum_runtime count(search) as count max(_time) as last_use first(search) as search  by user,search_type
            | eval last_use = strftime(last_use, "%F %T")
            | fields user, count, median_runtime, Perc90_runtime, cum_runtime, last_use,search,search_type | sort - cum_runtime 
            | rename user as user, count as "Search Count", median_runtime as "Median Runtime", Perc90_runtime as "90th Percentile Runtime", cum_runtime as "Cumulative Runtime", last_use as "Last Search"
            | fieldformat "Median Runtime" = `dmc_convert_runtime('Median Runtime')`
            | fieldformat "90th Percentile Runtime" = `dmc_convert_runtime('90th Percentile Runtime')`
            | fieldformat "Cumulative Runtime" = `dmc_convert_runtime('Cumulative Runtime')` | join type=left user [|rest splunk_server=ITGSPLKPRDSH01 /services/authentication/users | fields title , realname, roles, defaultApp | rename title as user] | fields - search</query>
          <earliest>$temps.earliest$</earliest>
          <latest>$temps.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <drilldown>
          <set token="usersel">$row.user$</set>
          <set token="realnamesel">$row.realname$</set>
        </drilldown>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>User $realnamesel$  ($usersel$) search activity</title>
        <search>
          <query>(search_id!="rsa_*" action=search host=<#ITGSPLKPRDSH*#> index=_audit sourcetype=audittrail) user=$usersel$ 
| eval search_type=case(match(search_id,"^SummaryDirector_"),"summarization",match(search_id,"^((rt_)?scheduler__|alertsmanager_)"),"scheduled",match(search_id,"\\d{10}\\.\\d+(_[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?$"),"ad hoc",true(),"other") | rex "(?ms)search='(?<searchstring>.*)', autojoin=" | eval search=if((isnull(savedsearch_name) OR (savedsearch_name == "")),search,savedsearch_name)  |  stats min(_time) as _time, values(user) as user, max(total_run_time) as total_run_time, first(search) as search, values(searchstring) as searchstrings ,first(search_type) as search_type, first(apiStartTime) as apiStartTime, first(apiEndTime) as apiEndTime by search_id
      | where isnotnull(search)  |  stats median(total_run_time) as median_runtime, Perc90(total_run_time) as Perc90_runtime, sum(total_run_time) as cum_runtime, count as count, max(_time) as last_use first(searchstrings) as searchstrings  by user,search_type,search | eval last_use = strftime(last_use, "%F %T")
            | fields user, count, median_runtime, Perc90_runtime, cum_runtime, last_use,search,search_type searchstrings | sort - cum_runtime 
            | rename user as user, count as "Search Count", median_runtime as "Median Runtime", Perc90_runtime as "90th Percentile Runtime", cum_runtime as "Cumulative Runtime", last_use as "Last Search"
            | fieldformat "Median Runtime" = `dmc_convert_runtime('Median Runtime')`
            | fieldformat "90th Percentile Runtime" = `dmc_convert_runtime('90th Percentile Runtime')`
            | fieldformat "Cumulative Runtime" = `dmc_convert_runtime('Cumulative Runtime')` | fields - user</query>
          <earliest>$temps.earliest$</earliest>
          <latest>$temps.latest$</latest>
        </search>
        <option name="count">100</option>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
</form>
0 Karma
Get Updates on the Splunk Community!

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off

We love our Splunk Community and want you to feel inspired by all your hard work! Eric Fusilero, our VP of ...

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...