I have been working in Splunk building reports/dashboards for about a year. Six months ago, I was tasked with creating an app and integrating with our hosting platform to create reports about website activity. I've built all of the reports we want in the app and they work; however, some of the reports are retrieving half a million events or more, which results in really long wait times for the reports (some over 10-20 minutes).
A few of the reports I'm able to run on a schedule and then display the results of the scheduled search rather than running them real-time. But most of my dashboards include Sideview modules for text, drop-down, and multi-select inputs to manipulate the report, and displaying results of a scheduled search just won't work.
My goals are:
Currently, if you try to restrict the results by clicking on something in the drop-down, it re-runs the entire search rather than processing the results based on the original data retrieved by the initial dashboard load.
I've tried to wrap my head around the examples in Sideview Utils and while what's explained makes sense, I'm struggling with actually implementing it in my own reports. I'm a learn-by-doing-with-copy/paste-and-tweak type of person and the examples don't seem to translate well to my dashboards.
Here's a generalized and slightly simplified version of one of my dashboards for use as an example. This one's extremely simple by comparison to most of the dashboards (this one has a time drop-down, two multi-selects, and a single chart, most of mine have 3-5 multi-selects and 4-6 charts which are all affected by changes to any one of the multi-selects).
<!-- Start Time Range Dropdown -->
<module name="URLLoader" layoutPanel="viewHeader" autoRun="True">
<module name="Pulldown" layoutPanel="panel_row1_col1" autoRun="True">
<param name="name">customRange</param>
<param name="label">Time Range: </param>
<param name="staticOptions">
<list>
<param name="label">Last 7 days</param>
<param name="selected">true</param>
<param name="value">-7d@d,@d,1d</param>
</list>
<list>
<param name="label">Last 30 days</param>
<param name="value">-30d@d,@d,1d</param>
</list>
</param>
<module name="ValueSetter">
<param name="name">multiValueTimeRange</param>
<param name="delim">,</param>
<param name="value">$customRange$</param>
<!-- End Time Range Dropdown -->
<!-- Start Web Site Dropdown -->
<module name="Search" layoutPanel="panel_row1_col1" autoRun="True">
<param name="search"><![CDATA[index="websites" | fields WebSiteKey Description | sort Description | `decode_entity("Description")` ]]>
</param>
<module name="Pulldown">
<param name="name">selectedWebsites</param>
<param name="label">Web site</param>
<param name="size">3</param>
<param name="template">WebSiteKey="$value$"</param>
<param name="separator">+OR+</param>
<param name="outerTemplate">( $value$ )</param>
<param name="staticFieldsToDisplay">
<list>
<param name="label">All Web Sites</param>
<param name="value">*</param>
</list>
</param>
<param name="searchFieldsToDisplay">
<list>
<param name="label">Description</param>
<param name="value">WebSiteKey</param>
</list>
</param>
<!-- End Web Site Dropdown -->
<!-- Start Mailing List Dropdown-->
<module name="Search" autoRun="True">
<param name="search"><![CDATA[index="mailinglists" $selectedWebsites$ | fields List_name ListId | sort List_name ]]>
</param>
<module name="Pulldown">
<param name="name">selectedMailingLists</param>
<param name="label">Mailing List</param>
<param name="size">4</param>
<param name="template">ListId="$value$"</param>
<param name="separator">+OR+</param>
<param name="outerTemplate">( $value$ )</param>
<param name="staticFieldsToDisplay">
<list>
<param name="label">All Mailing Lists</param>
<param name="value">*</param>
</list>
</param>
<param name="searchFieldsToDisplay">
<list>
<param name="label">List_name</param>
<param name="value">ListId</param>
</list>
</param>
<!-- End Mailing List Dropdown -->
<!-- Start Results Panel -->
<module name="Search">
<param name="search">
<![CDATA[index="usage" PageViewed="*?ET=*" $selectedWebsites$ |
fields PageViewed, ReaderUserKey, mlmid |
stats dc(ReaderUserKey) as "Clicks" by mlmid |
join mlmid [ search index="mailings" $selectedWebsites$ $selectedMailingLists$ |
fields _time, MailingID, OpenedCount, DeliveredCount, MailingSubject, ListId, BouncesCount |
eval mlmid=MailingID |
rename _time as eDate ] |
join type=outer ListId [ search index="mailinglists" earliest=0 latest=now $selectedWebsites$ $selectedMailingLists$ |
fields ListId, List_name ] |
eval Date=strftime(eDate, "%Y-%m-%d %I:%M %p") |
eval OpenedCount=round((DeliveredCount*0.125), 0) |
eval Delivered=(DeliveredCount - BouncesCount) |
eval OpenedCount=(OpenedCount + Clicks) |
table eDate, Date, Subject, List_name, DeliveredCount, Bounced, Delivered, OpenedCount, Clicks |
sort -eDate | fields - eDate]]>
</param>
<param name="earliest">$multiValueTimeRange[0]$</param>
<param name="latest">$multiValueTimeRange[1]$</param>
<module name="Paginator" layoutPanel="panel_row1_col1">
<param name="entityName">results</param>
<param name="count">50</param>
<module name="EnablePreview">
<param name="display">False</param>
<param name="enable">True</param>
<module name="SimpleResultsTable">
<param name="allowTransformedFieldSelect">True</param>
<param name="drilldown">none</param>
<param name="entityName">results</param>
<param name="count">50</param>
<module name="Gimp"/>
<module name="ConvertToDrilldownSearch">
<module name="ViewRedirector">
<param name="viewTarget">flashtimeline</param>
</module>
</module>
</module>
<module name="ViewRedirectorLink">
<param name="viewTarget">flashtimeline</param>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
As a side comment - you should absolutely take out every one of these autoRun="True" attributes except the one at the very top on the URLLoader. All these extra ones will be causing a lot of page slowness and probably causing some maddening bugs that you may or may not have run into yet.
I think the Gate solution is a better direction, as posted in your other question here - http://answers.splunk.com//answers/110436/search-optimization-and-caching-for-forms. Trying to use postProcess for this is going to require some stuff that'll feel pretty artificial in this case. However I can spell it out a bit.
Also, you certainly can use Sideview modules with scheduled saved searches. You just use the Splunk HiddenSavedSearch module or the Sideview SavedSearch module in place of the Sideview Search module. Aside from that it should all work as expected - shoot me an email if you had run into trouble here and we can figure out where things went sideways.
Summary Indexing is another area worth checking out, and is more applicable to your situation than postProcess frankly. http://docs.splunk.com/Documentation/Splunk/6.0/Knowledge/Usesummaryindexing This would require substantial work but would put you in a place where you had both more power and more flexibility. Form elements would render fast, even the first time, and you could suddenly do lots of things that had been impractical before.
That said, how you would do something like this and use PostProcess to get there, is that you would merge all of your saved searches and searches into one giant master search commonly called a "datacube". Then each Pulldown or Chart or Table would use a postProcess search to basically carve out of this high-dimensional cube, the data that it needed to render.
The considerable problem and I think deal-killing problem here is that not all sets of searches that you might mash together have characteristics that work as a single datacube search, and when they don't fit, the net performance can be quite a lot worse than dispatching N separate searches.
Typically if making the datacube feels artificial, like you're just arbitrarily gluing unrelated things together, then it's a bad idea.
An example off the top of my head that would lend itself well to a postprocess approach would be:
pulldown that needs to render distinct users, and have an "all users" option
Pulldown that needs to render distinct hosts, with an "all hosts" option
Pulldown that needs to render distinct applications,
chart that renders the hours of the day across the x axis and distinct users on Y, split by application
chart that renders total bytes downloaded by user.
chart that renders the most commonly used applications
base search would be:
foo bar baz | stats sum(bytes) count by user host application date_hour
and the 3 postprocess searches to drive the Pulldown:
| dedup user | sort user
| dedup host | sort host
| dedup application | sort application
and the three separate postprocess searches to drive the charts would be:
| chart dc(user) by date_hour by application
| chart sum(bytes) as bytes by user
| stats sum(count) as count by application
But I'm probably talking too much. The postProcess docs in Sideview Utils have been rewritten many times and they'll give you a stronger understanding.
You got it. Of course.. this is a little weird. If we call the Gate module with id=Gate2 gate2 and the Gate module with id="Gate3" gate3, then it's probably easier and simpler to have the downstream modules from gate2 and gate3 just sitting downstream from gate1. so that gate1 just has three direct downstream modules, and each one of those goes on to have further downstream modules....
Maybe you're thinking Gate modules can only have one downstream branch but they can have anything downstream.
So like this?
<module name="Gate">
<param name="to">Gate1</param>
</module>
<module name="Gate">
<param name="to">Gate2</param>
</module>
<module name="Gate">
<param name="to">Gate3</param>
</module>
Which would effectively be children of the Pulldowns and the Search modules (which appear above the Gate modules with the "id" params)?
Most of my dashboards are configured very similarly to the one that I sent you the full XML for.
You can't, no. However you can have multiple Gate modules that are all siblings of eachother (ie next to eachother), that each have a different "to" param, and that amounts to the same thing. I'll add it to my queue an item to allow comma-separated values in a future release.
@sideview - can I have multiple Gate ids in the "to" param? e.g.:
<module name="Gate">
<param name="to">Gate1,Gate2,Gate3</param>
</module>
This would be for a dashboard with multiple panels (populated by different searches) where the Pulldown modules apply to all the panels.
Here was the XML:
URLLoader autoRun=True
SavedSearch (no autoRun)
Gate + "to" param
Pulldown time picker (no autoRun)
Search/Pulldown for web site (no autoRun)
Search/Pulldown for list (no autoRun)
Button (with allowAutoSubmit=False)
Search
Gate + "to" param
Gate + "id" param
Pager
Table
After all this... I still recommend you look into summary indexing as well. 😃 And although this was a postprocess question again I don't recommend postprocess here.
After emailing the XML I was able to figure out the problem with the Gate module. It has a limitation where if you use the "to" param and "id" param functionality, the Gate module with the "id" param has to be at the top level of the hierarchy. In all my other uses of the module it had been so I had never noticed the bug.
Fortunately the view is easy to rework (by adding another Gate) so that the Gate with the "id" is at the top.
I'll post the summary of the final module hierarchy below:
@martin_mueller: You say, "... there surely must be a Splunk Partner somewhere near you to do that."
Are you referring to having a professional services person come on-site to help?
Well the modules don't care about (or even know about) what layoutPanel they're in. In fact they don't even know what other modules are present in the hierarchy - they only communicate through the context data aka $foo$ tokens.
Appearing for a second and then going poof is what you'd see if you didn't have that <param name="allowAutoSubmit">False</param>
on your Button. Can you email me the XML? nick@sideviewapps.com
Okay, scratch my first conclusion about what was causing the problem, that is almost certainly that is not the problem.
For the sake of getting it to display just the saved search results first, I've yanked the pulldowns for time, website, and list as well as the button, so all I have is:
URLLoader autoRun=true
SavedSearch (no autoRun)
Gate + "to" param
Gate + "id" param
Paginator, SimpleResultsTable, etc. to display the results (see XML in original question above)
The saved results pop briefly, then I get "No results found" and inspector says it's running "search *".
Nope. It's hard to explain but this is what you need. Note the SavedSearch and the Pulldown are siblings.
URLLoader autoRun=True
SavedSearch (no autoRun)
Gate + "to" param
Pulldown time picker (no autoRun)
Search/Pulldown for web site (no autoRun)
Search/Pulldown for list (no autoRun)
Button (with allowAutoSubmit=False)
Search
Gate + "id" param
and again, you want NO autorun anywhere except the one autoRun on the URLLoader. Any second one will only do harm.
Buttons are fine, I'd been considering adding one anyway.
It seems I still have something wrong; it doesn't load the saved search at all now. Could you review this list of modules (actual XML excluded for space) and let me know if my order is correct? (I can post the XML if needed.)
URLLoader autoRun=True
Pulldown time picker (no autoRun)
Search/Pulldown for web site autoRun=True
Search/Pulldown for list (no autoRun)
Button
Gate + "to" param (inside Button module)
Gate + "id" param
SavedSearch autoRun=True
Search to be updated by button click
There's one way it could work, if you're OK with having a Button module below your form elements.
1) Put a Button downstream from your form elements.
2) Give it <param name="allowSoftSubmit">True</param>
if you want the charts to reload when the user changes anything (False if you want them to click the button.
3) Give it <param name="allowAutoSubmit">False</param>
which will prevent the autoRun="True" push from going through the Button.
4) Make sure the Gate module carrying the scheduled search is below that Button.
This is a pretty complex scenario now but it may well work.
Does that mean I can't use Gate to load the initial scheduled search (i.e., the autoRun being turned on farther up the form means it will always re-run the search)?
Oh of course... hm. yea normally you'd use Gate to block that push from going below, but in this thing we just want to block while the page is loading and there's no $foo$ key for that. OK I take the Gate thing back. Bad idea. 😃 Sorry.
I've been trying to add the Gate module on to load the initial scheduled search, but even though I've got a scheduled saved search (and results cached), it persists in running the saved search real-time (and I do have "useHistory" set to "True").
I need autoRun="True" on the Pulldown modules for Website and Mailing List because Mailing List has to refresh if you change the Website Pulldown. Removing autoRun from both of these modules (except the SavedSearch module) results in diddly-squat happening when you load the page.
Well the whole point of the Gate module in the other post was so that the page initially loads using scheduled results, but any change by the user to the pulldowns causes an ad-hoc search to be dispatched. If you tried it and you found this wasn't the case, I suspect there was a small mistake like maybe you left an autoRun=True on the Pulldown branch (indeed if you did that there would be no performance difference at all, in fact a slight negative effect). If on the other hand your goal is to have all the searches complete faster even the ad-hoc ones, then yea Gate isn't much use here.
Here's why Gate doesn't work: it doesn't reduce the amount of processing at all. A search that takes 12 minutes to present results without Gate is STILL going to take 12 minutes to present results WITH Gate. (Yes, that's a real-world example.)
That's why I'm looking at post-processing. The second example in sideview's post, the one that is described as lending itself to post-processing, looks to be similar to what I'm trying to do: pulldown for time, pulldown for website, pulldown for mailing list; render chart with additional processing on data.
I agree with Nick here, you're likely looking for a means to pre-summarize results rather than post processing.
Concerning your need for someone to help you beyond the world of theory, there surely must be a Splunk Partner somewhere near you to do that. Going the pre-summarizing route involves much more than just rewriting one particular dashboard.