Solved: Sideviews - subsearches into ResultsValueSetter fo...

splunk_zen · ‎02-28-2013

How can I avoid using a subsearch (which I'm hitting timeout issues), making use of Sideview's ResultsValueSetter ?

I'm still quite new to it, so even after reading the docs I haven't figured out how should I convert my subsearch, flowing the first search output into another one.

Let's say I have,

MAIN_SEARCH [search SUB_SEARCH | fields date_month, date_mday, date_hour ] DOWNSTREAM_SEARCH | stats sum(ApdexScore) as TotalApdexScore

Which has the goal to determine each day Busy Hour and feed it downstream to compute the CPU usage over several clusters,

index=resources_reporting source=*/output/ora_queries/client07/OS_metrics.csv
[search index=resources_reporting source="*output/ora_queries/client07/SdpTermCauses.csv"
| stats sum(SERVICE_EVOCATIONS) as ServiceEvocationsSum by date_hour date_mday date_month
| stats max(ServiceEvocationsSum) as MaxServiceEvocationsSum by date_month, date_mday
| where ServiceEvocationsSum = MaxServiceEvocationsSum
| fields date_month, date_mday, date_hour
]
| eval Platform = case(
(HOST_NAME like "%scp%" AND NOT (HOST_NAME like "%dscp%")), "SCPs",
...
HOST_NAME like "%mgr%", "MGRs"
)
| eval OptimumCpuApdex = if(USED_CPU <= 60,1,0) | eval AcceptableCpuApdex = if(USED_CPU > 60 AND USED_CPU <= 75, 1,0)
| stats count(USED_CPU) as TotalEvents sum(OptimumCpuApdex) as TotalOptimumCpuApdex sum(AcceptableCpuApdex) as TotalAcceptableCpuApdex by Platform
| eval ScpCpuApdex=if(Platform="SCPs",(TotalOptimumCpuApdex + TotalAcceptableCpuApdex/2) / TotalEvents,0)
...
| eval MgrCpuApdex=if(Platform="MGRs",(TotalOptimumCpuApdex + TotalAcceptableCpuApdex/2) / TotalEvents,0)
| eval ApdexScore=round(100*(0.17*ScpCpuApdex + ... + 0.04*MgrCpuApdex), 2)
| stats sum(ApdexScore) as TotalApdexScore

the subsearch fed the fields date_month, date_mday, date_hour as _time into the main search,

but how should the sideview's XML look like ?

EDIT: Building on sideview's example, I came to the following code,

<view onunloadCancelJobs="true" template="dashboard.html">
  <label>Using ResultsValueSetter to plug values from one search into another search</label>
  <module name="AccountBar" layoutPanel="appHeader" />
  <module name="SideviewUtils" layoutPanel="appHeader" />
  <module name="AppBar" layoutPanel="appHeader" />

  <module name="Message" layoutPanel="messaging">
    <param name="filter">*</param>
    <param name="maxSize">15000</param>
    <param name="clearOnJobDispatch">False</param>
  </module>

<module name="Search" layoutPanel="panel_row1_col1" autoRun="True">
  <param name="search"><![CDATA[
    index=resources_reporting source="*output/ora_queries/client07/SdpTermCauses.csv"
    | stats sum(SERVICE_EVOCATIONS) as ServiceEvocationsSum by date_hour date_mday date_month
    | stats max(ServiceEvocationsSum) as MaxServiceEvocationsSum by date_month, date_mday
    | where ServiceEvocationsSum = MaxServiceEvocationsSum
    | format | fields date_month, date_mday, date_hour as bigOrSearch
  ]]></param>

  <param name="earliest">@week</param>
  <param name="latest">now</param>

  <module name="JobProgressIndicator"></module>
  <module name="Button">
    <param name="label">Search</param>

    <module name="ResultsValueSetter">
      <param name="fields">bigOrSearch</param>

      <module name="Search">
        <param name="search"><![CDATA[
          index=resources_reporting source="*/output/ora_queries/mt/OS_metrics.csv" $bigOrSearch$
| eval Platform = case(
    (HOST_NAME like "%scp%" AND NOT (HOST_NAME like "%dscp%")), "SCPs",
    ...
    HOST_NAME like "%mgr%", "MGRs"
    )
    | eval OptimumCpuApdex = if(USED_CPU <= 60,1,0) | eval AcceptableCpuApdex = if(USED_CPU > 60 AND USED_CPU <= 75, 1,0)
    | stats count(USED_CPU) as TotalEvents sum(OptimumCpuApdex) as TotalOptimumCpuApdex sum(AcceptableCpuApdex) as TotalAcceptableCpuApdex by Platform
    | eval ScpCpuApdex=if(Platform="SCPs",(TotalOptimumCpuApdex + TotalAcceptableCpuApdex/2) / TotalEvents,0)
    ...
    | eval MgrCpuApdex=if(Platform="MGRs",(TotalOptimumCpuApdex + TotalAcceptableCpuApdex/2) / TotalEvents,0)
    | eval ApdexScore=round(100*(0.17*ScpCpuApdex + ... + 0.04*MgrCpuApdex), 2)
    | stats sum(ApdexScore) as TotalApdexScore
    | gauge TotalApdexScore 0 80 90 100
    ]]></param>

    <module name="Pager">
    <module name="HiddenChartFormatter">
      <param name="charting.chart">radialGauge</param>
      <param name="charting.chart.rangeValues">[0,50,70,85,94,100]</param>
      <param name="charting.gaugeColors">[0x7B7B7B,0xBF3030,0xFFE800,0x4A8300,0x00308B]</param>          
      <module name="FlashChart" />
    </module>
    </module>
  </module>
</module>

sideview · ‎02-28-2013

Well, transposing this question over to some index=_internal data that everyone will have, lets say your case is this:

Find the hour of each day, at which the splunkd.log had the highest instantaneous events-per-second. Then we do a subsearch for those hours of those N days, to see what sourcetypes were indexed during those times.

Subsearches work by implicitly using the format command. even if your subsearch doesn't end in format, the system always sneaks on in there for you. And you can run the format command yourself to see what it does. It takes lots of arguments but the default arguments will end up with what we want here.

will give us a single row, with a field called 'search', that looks like:

( ( date_hour="15" AND date_mday="14" ) OR ( date_hour="9" AND date_mday="15" ) OR ( date_hour="21" AND date_mday="16" ) OR ( date_hour="18" AND date_mday="17" ) OR ( date_hour="15" AND date_mday="18" ) OR ( date_hour="15" AND date_mday="19" ) OR ( date_hour="13" AND date_mday="20" ) OR ( date_hour="15" AND date_mday="21" ) OR ( date_hour="8" AND date_mday="22" ) OR ( date_hour="14" AND date_mday="23" ) OR ( date_hour="12" AND date_mday="24" ) OR ( date_hour="11" AND date_mday="25" ) OR ( date_hour="11" AND date_mday="26" ) OR ( date_hour="16" AND date_mday="27" ) OR ( date_hour="3" AND date_mday="28" ) )

So that being the case, it's this field value that we'll use ResultsValueSetter to pull down to the UI for us. However, since it causes terrible problems to actually create a $foo$ token called "$search$", we rename the field beforehand (Sideview Utils will actually warn you about this if you try to create such a key).

And here's a working example of the XML to do exactly that -- pull that big OR search string down with ResultsValueSetter and plug it into another search.

<module name="Search" layoutPanel="panel_row1_col1">
  <param name="search"><![CDATA[
    index=_internal source=*metrics.log group=per_sourcetype_thruput series=splunkd | stats max(eps) by date_month, date_mday, date_hour | sort - max(eps) | streamstats count by date_mday | where count=1 | fields - count | sort date_mday date_month | fields date_mday date_hour | format | fields search | rename search as bigOrSearch
  ]]></param>

  <module name="Button">
    <param name="label">Search</param>

    <module name="ResultsValueSetter">
      <param name="fields">bigOrSearch</param>

      <module name="Search">
        <param name="search"><![CDATA[
          $bigOrSearch$ | top sourcetype
        ]]></param>

        <module name="Pager">
          <module name="Table"/>
        </module>
      </module>
    </module>
  </module>
</module>

PROS: The benefit here is that the subsearch will run as a regular search in its own process, and as such it wont be subject to subsearch limits in time, and it'll behave like a regular search. You also have the subtle but really cool benefit of being able to display information from those search results in the dashboard, using HTML modules or charts etc... So your dashboard can tell a sort-of multilayered story even when it's driven by pretty specific user input. Although you might not always to display statistics or charts about the "inner" search, another simple example is that you can throw a JobProgressIndicator in the first search and thus give the user feedback about the progress made by the inner search. With subsearches this feedback is impossible to give.

CONS: The downside here is that it doesn't solve the "number of rows returned" limit problem. If anything it makes it worse. If your subsearch is returning 80000 rows and you're hitting truncation at 50,000 rows, well the equivalent search field value as created by the format command is going to be enormous and pulling that string down to plug it into another search wont work well at all. In this particular case, we only have one row per day so you'll be fine.

And there's some documentation about the ResultsValueSetter module in Sideview Utils itself, which is where I think you got the idea about using RVS to rework a subsearch situation where you were hitting limits.

Download Sideview Utils if you haven't already, and get the latest from the Sideview site -- http://sideviewapps.com/apps/sideview-utils , free for internal use.

View solution in original post

sideview · ‎02-28-2013

Well, transposing this question over to some index=_internal data that everyone will have, lets say your case is this:

Find the hour of each day, at which the splunkd.log had the highest instantaneous events-per-second. Then we do a subsearch for those hours of those N days, to see what sourcetypes were indexed during those times.

Subsearches work by implicitly using the format command. even if your subsearch doesn't end in format, the system always sneaks on in there for you. And you can run the format command yourself to see what it does. It takes lots of arguments but the default arguments will end up with what we want here.

will give us a single row, with a field called 'search', that looks like:

( ( date_hour="15" AND date_mday="14" ) OR ( date_hour="9" AND date_mday="15" ) OR ( date_hour="21" AND date_mday="16" ) OR ( date_hour="18" AND date_mday="17" ) OR ( date_hour="15" AND date_mday="18" ) OR ( date_hour="15" AND date_mday="19" ) OR ( date_hour="13" AND date_mday="20" ) OR ( date_hour="15" AND date_mday="21" ) OR ( date_hour="8" AND date_mday="22" ) OR ( date_hour="14" AND date_mday="23" ) OR ( date_hour="12" AND date_mday="24" ) OR ( date_hour="11" AND date_mday="25" ) OR ( date_hour="11" AND date_mday="26" ) OR ( date_hour="16" AND date_mday="27" ) OR ( date_hour="3" AND date_mday="28" ) )

So that being the case, it's this field value that we'll use ResultsValueSetter to pull down to the UI for us. However, since it causes terrible problems to actually create a $foo$ token called "$search$", we rename the field beforehand (Sideview Utils will actually warn you about this if you try to create such a key).

And here's a working example of the XML to do exactly that -- pull that big OR search string down with ResultsValueSetter and plug it into another search.

<module name="Search" layoutPanel="panel_row1_col1">
  <param name="search"><![CDATA[
    index=_internal source=*metrics.log group=per_sourcetype_thruput series=splunkd | stats max(eps) by date_month, date_mday, date_hour | sort - max(eps) | streamstats count by date_mday | where count=1 | fields - count | sort date_mday date_month | fields date_mday date_hour | format | fields search | rename search as bigOrSearch
  ]]></param>

  <module name="Button">
    <param name="label">Search</param>

    <module name="ResultsValueSetter">
      <param name="fields">bigOrSearch</param>

      <module name="Search">
        <param name="search"><![CDATA[
          $bigOrSearch$ | top sourcetype
        ]]></param>

        <module name="Pager">
          <module name="Table"/>
        </module>
      </module>
    </module>
  </module>
</module>

PROS: The benefit here is that the subsearch will run as a regular search in its own process, and as such it wont be subject to subsearch limits in time, and it'll behave like a regular search. You also have the subtle but really cool benefit of being able to display information from those search results in the dashboard, using HTML modules or charts etc... So your dashboard can tell a sort-of multilayered story even when it's driven by pretty specific user input. Although you might not always to display statistics or charts about the "inner" search, another simple example is that you can throw a JobProgressIndicator in the first search and thus give the user feedback about the progress made by the inner search. With subsearches this feedback is impossible to give.

CONS: The downside here is that it doesn't solve the "number of rows returned" limit problem. If anything it makes it worse. If your subsearch is returning 80000 rows and you're hitting truncation at 50,000 rows, well the equivalent search field value as created by the format command is going to be enormous and pulling that string down to plug it into another search wont work well at all. In this particular case, we only have one row per day so you'll be fine.

And there's some documentation about the ResultsValueSetter module in Sideview Utils itself, which is where I think you got the idea about using RVS to rework a subsearch situation where you were hitting limits.

Download Sideview Utils if you haven't already, and get the latest from the Sideview site -- http://sideviewapps.com/apps/sideview-utils , free for internal use.

sideview · ‎03-01-2013

did it work? When I post here I'm just a "me". 😃 yw.

splunk_zen · ‎03-01-2013

I don't know if you're a "you" or a "team", but thanks for all the detailed steps !

Sideviews - subsearches into ResultsValueSetter form

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!