I have this simple search:
search index="summary" revenue=daily | timechart avg(daily) by sitename
...which as you can probably tell searches in a summary index through data about daily revenue. In the job inspector, it's telling me "Specified field(s) missing from results: 'daily'" even though there are over 44,000 events with the field in it, like this (some fields omitted):
06/25/2012 15:00:00, search_name="Daily Revenue by Site", search_now=1340663400.000, info_min_time=1340661600.000, info_max_time=1340663400.000, info_search_time=1340663434.230, daily="56.03", sitename="foo.com", revenue="daily"
What's going on? Do I need to do something special for summary indexes? Aren't they designed to work transparently like any other kind of index?
EDIT:
The search that adds to the index looks like:
index="billing" sourcetype="billing_log"
| bucket _time span=1h
| sistats sum(amt) by sitename
| rename psrsvd_sm_amt as daily
I also thought maybe it had to do with the rename, so in the search consuming the summary index, I did some dummy searches on some of the auto-generated fields like psrsvd_ss_amt. Still no results.
THE GIST:
What I'm really hoping to do is to identify the sites whose revenue for the last 24 hours is 10% <> the historical average FOR THAT SITE. For this to happen I need the historical averages for each site, and I need them up to date. So the simple search at the top is really an intermediate step, but every approach I can think of somehow involves a summary index of daily revenue and/or its average. And if I can't even search through the fields in the summary index, I'm stuck.
 
					
				
		
Regarding your comment and the blog post:
http://www.davidveuve.com/tech/how-i-do-summary-indexing-in-splunk/
Well, there are two ways of approaching summary indexing: one is the way in the blog post, where you take on the responsibility for figuring out how things should be stored and retrieved from the summary index. Once upon a time, that was the only way that it worked in Splunk - and lots of people got it wrong. Unfortunately, people sometimes got results that were statistically invalid, but looked okay.
So Splunk currently teaches a different approach in training classes, which sadly is not documented in any of the manuals. This approach requires:
A populating search that follows these rules
a. Runs on a regular interval and searches over the same interval (for example, runs once per hour, searching over the previous hour - may include a time lag)
b. As the last command in the pipeline, uses one of the si- commands (sistats, sichart, sitimechart, sitop, etc)
c. Don't put any other command after the si- command!
d. Make sure the search is writing to a summary index
Example:
sourcetype=linux_secure | sistats count by src_ip user
A reporting search that follows these rules
a. Starts with index=summary search_name="populating search name" | xyz
     where xyz is the command corresponding to the si- command (so: stats, chart, timechart, top, etc.) AND has the same arguments
b. can be followed by anything else that you want
Example (which can be run over any time period):
index=summary search_name="mypopSearch" | stats count by src_ip user
As you have noted, this isn't the only way to do things. Sometimes it isn't the best way, but it does work and gives statistically valid results.
Configure Summary Indexing gives a lot of details about summary indexing, but lacks a straight forward how-to for either approach. 😞
Thanks for your answer!
I suppose where I went astray in my approach was the rename after the sistats. Why shouldn't I have anything after that command? Is that gotcha documented anywhere?
Aha:
http://www.davidveuve.com/tech/how-i-do-summary-indexing-in-splunk/
si-* commands "will only work well if you’re going to use -exactly- the stats command you’re using to generate your index. If you change things around, you’re going to find yourself trying to understand why on earth you can’t read the contents of your index." Which is exactly what happened.
I guess my first mistake was assuming that something like this would be clearly documented. Somehwere. Like outside a random blog post.
And, FWIW, if I do:
index="summary" revenue=daily | stats count(daily) by sitename
it'll show the count as 0 for all sites.
