All,
I am trying to tune performance on a set of data. Basically I have narrowed it down to search time extractions being the issue but I really don’t see any resource limits on the search heads that indicate that it’s working all that hard on the data set. That is, low CPU ~2% and yet I am waiting minutes for fields to extract. Any recommendation on how I might get more performance from this search?
Here are my notes -
My search
index=akamai over 1 hour
Fast Mode I get - slow but workable
--9,525,499 events in 39.804 seconds
However, this set of data normally needs the fields. So in smart mode
--9,571,210 events in 243.244 seconds
This means we're talking 4 minutes an hour, which not an acceptable performance for our user. As it's "smart mode" this implies it's a search tier issue. Correct? I went ahead and setup batch mode searh parallelization and here are my new results. 
Fast mode 
--9,440,668 events in 17.909 seconds
Smart Mode
--9,453,800 events in 134.911 second
While this improvement is great. We're still looking at Smart mode being over 2 minutes per hour of data. We continue to need raw data searches to be more performant. Search time extractions are just taking too long. I was wondering if there are any ways to tell what extraction is taking too long? The data is well formed, cooked JSON data coming from a heavy forwarder, which pull data down from Akamai. Perhaps I need to convert some fields to index time? There are common ones like "site" which are usually used. 
So I went one step farther end upped it to 3 pipelines. 
Fast Mode
--9,510,314 events in 15.77 seconds
Smart Mode
--9,498,542 events in 130.875 seconds
Over all, we're still exceeding 2 minutes per hour of data search time for Akamai with extractions. I'd like to get that down closer to 1 minute, per hour of data. That is asking our user to wait 24 minutes for just a day of data. So some reading shows I might want to try INDEXED_EXTRACTIONS = json, so I applied that to my heavy forwarder. I went ahead and applied that to one of my two heavy forwarders that process the akamai data and let it bake for an hour. Fast Mode time went up to 17 seconds and Search Mode time went up to 160 seconds
With the clear decrease in performance over all from indexed time extractions I went ahead and disabled that immediately. I am using the Splunk Add-on for Akamai from Splunkbase here for props.conf. 
https://splunkbase.splunk.com/app/3030/
Overall I am not seeing much in terms of CPU usage on the Search head. 1.9% - 3% CPU during the search. So I am not sure how to get the field extraction process on the search head to use all the idle resources.
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		So let me make sure I understand this ... You're ingesting some data, and then running a search that is just:
index=akamai
and then comparing the speed in smart versus fast mode? And you're looking at the total system CPU usage and only seeing a small amount of CPU in use on your search head? I think several fundamental concepts need to be reinforced.
First, a search of the form index=xxx is one of the densest searches you can possibly do.  You are asking Splunk to bring back ALL of the events in the index for the time range, without any type of statistics or reporting commands being run.  This is guaranteed to saturate indexer CPU core(s)  with decompression.  And, because you're asking Splunk to return events in a table view, most of the batch mode optimizations cannot take effect (which I will try to cover briefly)
Second, what are you going to do with these 9+ million records? Scan through them by hand, using eyeballs? Even if you can scan a page of 100 events per second, Splunk is still out running you.
Third, you should not expect to see heavy CPU utilization for field extraction at your search head, but rather at your indexer(s).
Fourth, I don't see why you would expect INDEXED_EXTRACTIONS to speed up a super dense search that does not have any search terms built into it.  
So let's try to build some concepts and work from there.
Performance of a search is dependent on the type of search you're trying to do (dense versus sparse,  "reporting" versus not, and your fields of interest).  A super dense search without a reporting command in smart mode is going to perform very differently than that same command had you only added a reporting command like ... | stats count.
When you write a search and ask Splunk to execute it, the search head dispatches the job in parallel to all of your indexers. Each indexer allocates a minimum of one single-threaded search process for your search.
In batch mode - where possible, up to your search parallelization limit - additional single-threaded search processes will be started.  The first key here is that batch mode is not guaranteed - Splunk knows that for certain types of searches the order the events get returned in does not matter.  For example, a reporting search of | stats count by field1, field2, field3 has no strict ordering requirement because the stats command can count things without requiring a strict time ordering.  But, if you leave off the stats command (or another reporting command) then Splunk realizes that you're piping a table of raw events to a user, who expects them to be sorted in a certain time order ... reducing the effectiveness of batch mode.
Field extractions happen on the indexer. No amount of field extraction will effect CPU usage on the search head dramatically. The search head sends its configuration bundle to the indexers via the bundle replication process, and the indexer processes use that configuration bundle as the configuration files for search processes launched in support of that search head.
All searches begin by converting your specified search into a "literal search" (see litsearch in the Job Inspector), along with a remoteSearch (the search to be run at the indexers) and a reportingSearch (if applicable).  The search at the indexers begins generally with picking out "in scope" buckets by looking first at your selected indexes and selected time range.  Any buckets that fall outside of the selected indexes or do not overlap the search time range are considered out of scope.  In-scope buckets are then checked via the tsidx lexicon against the LISPY generated by Splunk against everything up to the first | character of the remoteSearch.  (Ignoring some push-left activity in 6.5 that attempts to push terms as far left as possible).  Those events that match the LISPY expression are seeked-to in the raw data and decompressed and then passed along for field extraction.
There's more to consider here, and part of your overall perceived problem may be the amount of time required to do field extraction. But, I think you need to have a more realistic search that does what your user is trying to actually do. Don't automatically assume that "simpler searches can be used to performance model more complex ones".
 
					
				
		
This is an awesome reply.
 
		
		
		
		
		
	
			
		
		
			
					
		40k EPS for the single search pipeline, 73k EPS for two search pipelines seems fine to me.
 
					
				
		
As @ssievert said, seeing the Search Job Inspector data and the related info is very important.
But also, we need to see your search. There are many ways to optimize a specific search, in addition to optimizing the overall search process.
 
					
				
		
Double Ditto.
 
		
		
		
		
		
	
			
		
		
			
					
		What kind of searches are you running? Are they dense searches, creating some sort of statistic over the hour? Or are they needle-in-a-haystack searches looking for very specific key/value combinations?
The searches you list simply retrieve all 9MM events for the timeframe, surely that is not what your users will do!?
What does your infrastructure look like, how many SH (# of cores?), how many indexers (# of cores?), what is your daily ingest in GB?
Can you post a Job Inspector output from your search?
