Getting Data In

How to find out who is using the most ressources in a clustered environment per user and per search type ?

sgessati
Explorer

Dear all,

We're starting to suffer from our user where our clustered architecture is collapsing under the heavy load of searches that prevent our indexers to index. O_o

Using Splunk Enterprise 7.0.5 we don't have access to Workload Management feature starting in Splunk 7.2.x and even so, our Linux is not yet a 7.x with systemd so we won't be able to enforce it after upgrading Splunk yet.

Having more than 2000 users defined in almost 300 search head apps that let them define their dashboards, reports, alerts, schedule and accelerations, it's increasingly vital to pin point bad searchers.

Although the Monitoring Console provide some great dashboards, it's a heck to check resource usage per indexer and per search head to get the Big Picture.

In the answer, I'll share a Dashboard made by Splunk Professional Services I wish I had by default within the Monitoring Console.
It's sum the cumulative search time of each user by search type (Ad hoc or Scheduled) across all Search Heads and give you some context like full user name and roles. It even give you details of all searched run per selected user in a bottom panel !

If it helps me out, I'm sure it will help you out as well, feel free to share. Or maybe you have something even better to share ! \o/

0 Karma

sgessati
Explorer

Ok, Here's the dashboard you can copy and past within a brand new one from your Monitoring Console since it's using some rest commands.

Two kind of things to modify suiting your environment within <#SOMETHING#> in the code removing <##> and changing SOMETHING by what you have. 😛
The first two one is one of our Search Head Cluster member to retrieve user's context (Real Name and Roles)
Then we provide all our Splunk Search Heads Members using "ITGSPLKPRDSH*" hence having a naming convention (they are called itgsplkprdsh01, itgsplkprdsh02, itgsplkprdsh03, etc...)

Enjoy and may the Splunk be with you, always ! ^_^

<form>
  <label>SPLK top users per runtime (clic on any line to get details of all search for that user within the bottom panel)</label>
  <fieldset submitButton="true" autoRun="true">
    <input type="time" token="temps" searchWhenChanged="false">
      <label>periode</label>
      <default>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="multiselect" token="usersfilterglobal">
      <label>utilisateurs (par id)</label>
      <choice value="*">All</choice>
      <default>*</default>
      <prefix>(</prefix>
      <suffix>)</suffix>
      <initialValue>*</initialValue>
      <valuePrefix>user=</valuePrefix>
      <delimiter> OR </delimiter>
      <fieldForLabel>title</fieldForLabel>
      <fieldForValue>title</fieldForValue>
      <search>
        <query>|rest splunk_server=<#ONE OF YOUR SEARCH HEAD TO GET USER NAME AND ROLES#> /services/authentication/users | fields title | dedup title</query>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </search>
    </input>
    <input type="multiselect" token="utilisateursfilterglobal">
      <label>utilisateurs (par nom)</label>
      <choice value="*">All</choice>
      <default>*</default>
      <prefix>(</prefix>
      <suffix>)</suffix>
      <initialValue>*</initialValue>
      <valuePrefix>user=</valuePrefix>
      <delimiter> OR </delimiter>
      <fieldForLabel>realname</fieldForLabel>
      <fieldForValue>user</fieldForValue>
      <search>
        <query>|rest splunk_server=<#ONE OF YOUR SEARCH HEAD TO GET USER NAME AND ROLES#> /services/authentication/users | fields title , realname | rename title as user</query>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </search>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>Top cumulative runtime by users and search type</title>
      <table>
        <search>
          <query>`dmc_audit_get_searches(<#ITGSPLKPRDSH*#>)`| search $usersfilterglobal$ AND $utilisateursfilterglobal$|   stats min(_time) as _time, values(user) as user, max(total_run_time) as total_run_time, first(search) as search, first(search_type) as search_type, first(apiStartTime) as apiStartTime, first(apiEndTime) as apiEndTime by search_id
      | where isnotnull(search)  |  stats median(total_run_time) as median_runtime Perc90(total_run_time) as Perc90_runtime sum(total_run_time) as cum_runtime count(search) as count max(_time) as last_use first(search) as search  by user,search_type
            | eval last_use = strftime(last_use, "%F %T")
            | fields user, count, median_runtime, Perc90_runtime, cum_runtime, last_use,search,search_type | sort - cum_runtime 
            | rename user as user, count as "Search Count", median_runtime as "Median Runtime", Perc90_runtime as "90th Percentile Runtime", cum_runtime as "Cumulative Runtime", last_use as "Last Search"
            | fieldformat "Median Runtime" = `dmc_convert_runtime('Median Runtime')`
            | fieldformat "90th Percentile Runtime" = `dmc_convert_runtime('90th Percentile Runtime')`
            | fieldformat "Cumulative Runtime" = `dmc_convert_runtime('Cumulative Runtime')` | join type=left user [|rest splunk_server=ITGSPLKPRDSH01 /services/authentication/users | fields title , realname, roles, defaultApp | rename title as user] | fields - search</query>
          <earliest>$temps.earliest$</earliest>
          <latest>$temps.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <drilldown>
          <set token="usersel">$row.user$</set>
          <set token="realnamesel">$row.realname$</set>
        </drilldown>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>User $realnamesel$  ($usersel$) search activity</title>
        <search>
          <query>(search_id!="rsa_*" action=search host=<#ITGSPLKPRDSH*#> index=_audit sourcetype=audittrail) user=$usersel$ 
| eval search_type=case(match(search_id,"^SummaryDirector_"),"summarization",match(search_id,"^((rt_)?scheduler__|alertsmanager_)"),"scheduled",match(search_id,"\\d{10}\\.\\d+(_[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?$"),"ad hoc",true(),"other") | rex "(?ms)search='(?<searchstring>.*)', autojoin=" | eval search=if((isnull(savedsearch_name) OR (savedsearch_name == "")),search,savedsearch_name)  |  stats min(_time) as _time, values(user) as user, max(total_run_time) as total_run_time, first(search) as search, values(searchstring) as searchstrings ,first(search_type) as search_type, first(apiStartTime) as apiStartTime, first(apiEndTime) as apiEndTime by search_id
      | where isnotnull(search)  |  stats median(total_run_time) as median_runtime, Perc90(total_run_time) as Perc90_runtime, sum(total_run_time) as cum_runtime, count as count, max(_time) as last_use first(searchstrings) as searchstrings  by user,search_type,search | eval last_use = strftime(last_use, "%F %T")
            | fields user, count, median_runtime, Perc90_runtime, cum_runtime, last_use,search,search_type searchstrings | sort - cum_runtime 
            | rename user as user, count as "Search Count", median_runtime as "Median Runtime", Perc90_runtime as "90th Percentile Runtime", cum_runtime as "Cumulative Runtime", last_use as "Last Search"
            | fieldformat "Median Runtime" = `dmc_convert_runtime('Median Runtime')`
            | fieldformat "90th Percentile Runtime" = `dmc_convert_runtime('90th Percentile Runtime')`
            | fieldformat "Cumulative Runtime" = `dmc_convert_runtime('Cumulative Runtime')` | fields - user</query>
          <earliest>$temps.earliest$</earliest>
          <latest>$temps.latest$</latest>
        </search>
        <option name="count">100</option>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
</form>
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...