All Apps and Add-ons

Newbie to post-processing looking for help

redc
Builder

I have been working in Splunk building reports/dashboards for about a year. Six months ago, I was tasked with creating an app and integrating with our hosting platform to create reports about website activity. I've built all of the reports we want in the app and they work; however, some of the reports are retrieving half a million events or more, which results in really long wait times for the reports (some over 10-20 minutes).

A few of the reports I'm able to run on a schedule and then display the results of the scheduled search rather than running them real-time. But most of my dashboards include Sideview modules for text, drop-down, and multi-select inputs to manipulate the report, and displaying results of a scheduled search just won't work.

My goals are:

  1. Rapidly display the initial load - I'd like for this initial load to be a scheduled search that displays a cached result set.
  2. Improve the processing of searches that restrict the results.

Currently, if you try to restrict the results by clicking on something in the drop-down, it re-runs the entire search rather than processing the results based on the original data retrieved by the initial dashboard load.

I've tried to wrap my head around the examples in Sideview Utils and while what's explained makes sense, I'm struggling with actually implementing it in my own reports. I'm a learn-by-doing-with-copy/paste-and-tweak type of person and the examples don't seem to translate well to my dashboards.

Here's a generalized and slightly simplified version of one of my dashboards for use as an example. This one's extremely simple by comparison to most of the dashboards (this one has a time drop-down, two multi-selects, and a single chart, most of mine have 3-5 multi-selects and 4-6 charts which are all affected by changes to any one of the multi-selects).

<!-- Start Time Range Dropdown -->
<module name="URLLoader" layoutPanel="viewHeader" autoRun="True">
  <module name="Pulldown" layoutPanel="panel_row1_col1" autoRun="True">
    <param name="name">customRange</param>
    <param name="label">Time Range: </param>
    <param name="staticOptions">
      <list>
        <param name="label">Last 7 days</param>
        <param name="selected">true</param>
        <param name="value">-7d@d,@d,1d</param>
      </list>
      <list>
        <param name="label">Last 30 days</param>
        <param name="value">-30d@d,@d,1d</param>
      </list>
    </param>
    <module name="ValueSetter">
      <param name="name">multiValueTimeRange</param>
      <param name="delim">,</param>
      <param name="value">$customRange$</param>
<!-- End Time Range Dropdown -->
<!-- Start Web Site Dropdown -->
      <module name="Search" layoutPanel="panel_row1_col1" autoRun="True">
        <param name="search"><![CDATA[index="websites" | fields WebSiteKey Description | sort Description | `decode_entity("Description")` ]]>
        </param>
        <module name="Pulldown">
          <param name="name">selectedWebsites</param>
          <param name="label">Web site</param>
          <param name="size">3</param>
          <param name="template">WebSiteKey="$value$"</param>
          <param name="separator">+OR+</param>
          <param name="outerTemplate">( $value$ )</param>
          <param name="staticFieldsToDisplay">
            <list>
              <param name="label">All Web Sites</param>
              <param name="value">*</param>
            </list>
          </param>
          <param name="searchFieldsToDisplay">
            <list>
              <param name="label">Description</param>
              <param name="value">WebSiteKey</param>
            </list>
          </param>
<!-- End Web Site Dropdown -->
<!-- Start Mailing List Dropdown-->
          <module name="Search" autoRun="True">
            <param name="search"><![CDATA[index="mailinglists" $selectedWebsites$ | fields List_name ListId | sort List_name ]]>
            </param>
            <module name="Pulldown">
              <param name="name">selectedMailingLists</param>
              <param name="label">Mailing List</param>
              <param name="size">4</param>
              <param name="template">ListId="$value$"</param>
              <param name="separator">+OR+</param>
              <param name="outerTemplate">( $value$ )</param>
              <param name="staticFieldsToDisplay">
                <list>
                  <param name="label">All Mailing Lists</param>
                  <param name="value">*</param>
                </list>
              </param>
              <param name="searchFieldsToDisplay">
                <list>
                  <param name="label">List_name</param>
                  <param name="value">ListId</param>
                </list>
              </param>
<!-- End Mailing List Dropdown -->
<!-- Start Results Panel -->
              <module name="Search">
                <param name="search">
                  <![CDATA[index="usage" PageViewed="*?ET=*" $selectedWebsites$ | 
fields PageViewed, ReaderUserKey, mlmid | 
stats dc(ReaderUserKey) as "Clicks" by mlmid | 
join mlmid [ search index="mailings" $selectedWebsites$ $selectedMailingLists$ | 
fields _time, MailingID, OpenedCount, DeliveredCount, MailingSubject, ListId, BouncesCount | 
eval mlmid=MailingID | 
rename _time as eDate ] | 
join type=outer ListId [ search index="mailinglists" earliest=0 latest=now $selectedWebsites$ $selectedMailingLists$ | 
fields ListId, List_name ] | 
eval Date=strftime(eDate, "%Y-%m-%d %I:%M %p") | 
eval OpenedCount=round((DeliveredCount*0.125), 0) | 
eval Delivered=(DeliveredCount - BouncesCount) | 
eval OpenedCount=(OpenedCount + Clicks) | 
table eDate, Date, Subject, List_name, DeliveredCount, Bounced, Delivered, OpenedCount, Clicks | 
sort -eDate | fields - eDate]]>
                </param>
                <param name="earliest">$multiValueTimeRange[0]$</param>
                <param name="latest">$multiValueTimeRange[1]$</param>
                <module name="Paginator" layoutPanel="panel_row1_col1">
                  <param name="entityName">results</param>
                  <param name="count">50</param>
                  <module name="EnablePreview">
                    <param name="display">False</param>
                    <param name="enable">True</param>
                    <module name="SimpleResultsTable">
                      <param name="allowTransformedFieldSelect">True</param>
                      <param name="drilldown">none</param>
                      <param name="entityName">results</param>
                      <param name="count">50</param>
                      <module name="Gimp"/>
                      <module name="ConvertToDrilldownSearch">
                        <module name="ViewRedirector">
                          <param name="viewTarget">flashtimeline</param>
                        </module>
                      </module>
                    </module>
                    <module name="ViewRedirectorLink">
                      <param name="viewTarget">flashtimeline</param>
                    </module>
                  </module>
                </module>
              </module>
            </module>
          </module>
        </module>
      </module>
    </module>
  </module>
</module>

sideview
SplunkTrust
SplunkTrust

As a side comment - you should absolutely take out every one of these autoRun="True" attributes except the one at the very top on the URLLoader. All these extra ones will be causing a lot of page slowness and probably causing some maddening bugs that you may or may not have run into yet.

0 Karma

sideview
SplunkTrust
SplunkTrust

I think the Gate solution is a better direction, as posted in your other question here - http://answers.splunk.com//answers/110436/search-optimization-and-caching-for-forms. Trying to use postProcess for this is going to require some stuff that'll feel pretty artificial in this case. However I can spell it out a bit.

Also, you certainly can use Sideview modules with scheduled saved searches. You just use the Splunk HiddenSavedSearch module or the Sideview SavedSearch module in place of the Sideview Search module. Aside from that it should all work as expected - shoot me an email if you had run into trouble here and we can figure out where things went sideways.

Summary Indexing is another area worth checking out, and is more applicable to your situation than postProcess frankly. http://docs.splunk.com/Documentation/Splunk/6.0/Knowledge/Usesummaryindexing This would require substantial work but would put you in a place where you had both more power and more flexibility. Form elements would render fast, even the first time, and you could suddenly do lots of things that had been impractical before.

That said, how you would do something like this and use PostProcess to get there, is that you would merge all of your saved searches and searches into one giant master search commonly called a "datacube". Then each Pulldown or Chart or Table would use a postProcess search to basically carve out of this high-dimensional cube, the data that it needed to render.

The considerable problem and I think deal-killing problem here is that not all sets of searches that you might mash together have characteristics that work as a single datacube search, and when they don't fit, the net performance can be quite a lot worse than dispatching N separate searches.

Typically if making the datacube feels artificial, like you're just arbitrarily gluing unrelated things together, then it's a bad idea.

An example off the top of my head that would lend itself well to a postprocess approach would be:

pulldown that needs to render distinct users, and have an "all users" option
Pulldown that needs to render distinct hosts, with an "all hosts" option
Pulldown that needs to render distinct applications,

chart that renders the hours of the day across the x axis and distinct users on Y, split by application
chart that renders total bytes downloaded by user.
chart that renders the most commonly used applications

base search would be:

foo bar baz | stats sum(bytes) count by user host application date_hour

and the 3 postprocess searches to drive the Pulldown:

| dedup user | sort user

| dedup host | sort host

| dedup application | sort application

and the three separate postprocess searches to drive the charts would be:

| chart dc(user) by date_hour by application

| chart sum(bytes) as bytes by user

| stats sum(count) as count by application

But I'm probably talking too much. The postProcess docs in Sideview Utils have been rewritten many times and they'll give you a stronger understanding.

0 Karma

sideview
SplunkTrust
SplunkTrust

You got it. Of course.. this is a little weird. If we call the Gate module with id=Gate2 gate2 and the Gate module with id="Gate3" gate3, then it's probably easier and simpler to have the downstream modules from gate2 and gate3 just sitting downstream from gate1. so that gate1 just has three direct downstream modules, and each one of those goes on to have further downstream modules....

Maybe you're thinking Gate modules can only have one downstream branch but they can have anything downstream.

0 Karma

redc
Builder

So like this?

<module name="Gate">
  <param name="to">Gate1</param>
</module>
<module name="Gate">
  <param name="to">Gate2</param>
</module>
<module name="Gate">
  <param name="to">Gate3</param>
</module>

Which would effectively be children of the Pulldowns and the Search modules (which appear above the Gate modules with the "id" params)?

Most of my dashboards are configured very similarly to the one that I sent you the full XML for.

0 Karma

sideview
SplunkTrust
SplunkTrust

You can't, no. However you can have multiple Gate modules that are all siblings of eachother (ie next to eachother), that each have a different "to" param, and that amounts to the same thing. I'll add it to my queue an item to allow comma-separated values in a future release.

0 Karma

redc
Builder

@sideview - can I have multiple Gate ids in the "to" param? e.g.:

<module name="Gate">
  <param name="to">Gate1,Gate2,Gate3</param>
</module>

This would be for a dashboard with multiple panels (populated by different searches) where the Pulldown modules apply to all the panels.

0 Karma

sideview
SplunkTrust
SplunkTrust

Here was the XML:

URLLoader autoRun=True
SavedSearch (no autoRun)
Gate + "to" param
Pulldown time picker (no autoRun)
Search/Pulldown for web site (no autoRun)
Search/Pulldown for list (no autoRun)
Button (with allowAutoSubmit=False)
Search
Gate + "to" param
Gate + "id" param
Pager
Table

After all this... I still recommend you look into summary indexing as well. 😃 And although this was a postprocess question again I don't recommend postprocess here.

0 Karma

sideview
SplunkTrust
SplunkTrust

After emailing the XML I was able to figure out the problem with the Gate module. It has a limitation where if you use the "to" param and "id" param functionality, the Gate module with the "id" param has to be at the top level of the hierarchy. In all my other uses of the module it had been so I had never noticed the bug.

Fortunately the view is easy to rework (by adding another Gate) so that the Gate with the "id" is at the top.

I'll post the summary of the final module hierarchy below:

0 Karma

redc
Builder

@martin_mueller: You say, "... there surely must be a Splunk Partner somewhere near you to do that."

Are you referring to having a professional services person come on-site to help?

0 Karma

sideview
SplunkTrust
SplunkTrust

Well the modules don't care about (or even know about) what layoutPanel they're in. In fact they don't even know what other modules are present in the hierarchy - they only communicate through the context data aka $foo$ tokens.

Appearing for a second and then going poof is what you'd see if you didn't have that <param name="allowAutoSubmit">False</param> on your Button. Can you email me the XML? nick@sideviewapps.com

redc
Builder

Okay, scratch my first conclusion about what was causing the problem, that is almost certainly that is not the problem.

For the sake of getting it to display just the saved search results first, I've yanked the pulldowns for time, website, and list as well as the button, so all I have is:

URLLoader autoRun=true
SavedSearch (no autoRun)
Gate + "to" param
Gate + "id" param
Paginator, SimpleResultsTable, etc. to display the results (see XML in original question above)

The saved results pop briefly, then I get "No results found" and inspector says it's running "search *".

0 Karma

sideview
SplunkTrust
SplunkTrust

Nope. It's hard to explain but this is what you need. Note the SavedSearch and the Pulldown are siblings.

URLLoader autoRun=True
SavedSearch (no autoRun)
Gate + "to" param
Pulldown time picker (no autoRun)
Search/Pulldown for web site (no autoRun)
Search/Pulldown for list (no autoRun)
Button (with allowAutoSubmit=False)
Search
Gate + "id" param

and again, you want NO autorun anywhere except the one autoRun on the URLLoader. Any second one will only do harm.

0 Karma

redc
Builder

Buttons are fine, I'd been considering adding one anyway.

It seems I still have something wrong; it doesn't load the saved search at all now. Could you review this list of modules (actual XML excluded for space) and let me know if my order is correct? (I can post the XML if needed.)

URLLoader autoRun=True
Pulldown time picker (no autoRun)
Search/Pulldown for web site autoRun=True
Search/Pulldown for list (no autoRun)
Button
Gate + "to" param (inside Button module)
Gate + "id" param
SavedSearch autoRun=True
Search to be updated by button click

0 Karma

sideview
SplunkTrust
SplunkTrust

There's one way it could work, if you're OK with having a Button module below your form elements.

1) Put a Button downstream from your form elements.

2) Give it <param name="allowSoftSubmit">True</param> if you want the charts to reload when the user changes anything (False if you want them to click the button.

3) Give it <param name="allowAutoSubmit">False</param> which will prevent the autoRun="True" push from going through the Button.
4) Make sure the Gate module carrying the scheduled search is below that Button.

This is a pretty complex scenario now but it may well work.

0 Karma

redc
Builder

Does that mean I can't use Gate to load the initial scheduled search (i.e., the autoRun being turned on farther up the form means it will always re-run the search)?

0 Karma

sideview
SplunkTrust
SplunkTrust

Oh of course... hm. yea normally you'd use Gate to block that push from going below, but in this thing we just want to block while the page is loading and there's no $foo$ key for that. OK I take the Gate thing back. Bad idea. 😃 Sorry.

0 Karma

redc
Builder

I've been trying to add the Gate module on to load the initial scheduled search, but even though I've got a scheduled saved search (and results cached), it persists in running the saved search real-time (and I do have "useHistory" set to "True").

I need autoRun="True" on the Pulldown modules for Website and Mailing List because Mailing List has to refresh if you change the Website Pulldown. Removing autoRun from both of these modules (except the SavedSearch module) results in diddly-squat happening when you load the page.

0 Karma

sideview
SplunkTrust
SplunkTrust

Well the whole point of the Gate module in the other post was so that the page initially loads using scheduled results, but any change by the user to the pulldowns causes an ad-hoc search to be dispatched. If you tried it and you found this wasn't the case, I suspect there was a small mistake like maybe you left an autoRun=True on the Pulldown branch (indeed if you did that there would be no performance difference at all, in fact a slight negative effect). If on the other hand your goal is to have all the searches complete faster even the ad-hoc ones, then yea Gate isn't much use here.

0 Karma

redc
Builder

Here's why Gate doesn't work: it doesn't reduce the amount of processing at all. A search that takes 12 minutes to present results without Gate is STILL going to take 12 minutes to present results WITH Gate. (Yes, that's a real-world example.)

That's why I'm looking at post-processing. The second example in sideview's post, the one that is described as lending itself to post-processing, looks to be similar to what I'm trying to do: pulldown for time, pulldown for website, pulldown for mailing list; render chart with additional processing on data.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I agree with Nick here, you're likely looking for a means to pre-summarize results rather than post processing.

Concerning your need for someone to help you beyond the world of theory, there surely must be a Splunk Partner somewhere near you to do that. Going the pre-summarizing route involves much more than just rewriting one particular dashboard.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.