So let me make sure I understand this ... You're ingesting some data, and then running a search that is just:
index=akamai
and then comparing the speed in smart versus fast mode? And you're looking at the total system CPU usage and only seeing a small amount of CPU in use on your search head? I think several fundamental concepts need to be reinforced.
First, a search of the form index=xxx is one of the densest searches you can possibly do. You are asking Splunk to bring back ALL of the events in the index for the time range, without any type of statistics or reporting commands being run. This is guaranteed to saturate indexer CPU core(s) with decompression. And, because you're asking Splunk to return events in a table view, most of the batch mode optimizations cannot take effect (which I will try to cover briefly)
Second, what are you going to do with these 9+ million records? Scan through them by hand, using eyeballs? Even if you can scan a page of 100 events per second, Splunk is still out running you.
Third, you should not expect to see heavy CPU utilization for field extraction at your search head, but rather at your indexer(s).
Fourth, I don't see why you would expect INDEXED_EXTRACTIONS to speed up a super dense search that does not have any search terms built into it.
So let's try to build some concepts and work from there.
Performance of a search is dependent on the type of search you're trying to do (dense versus sparse, "reporting" versus not, and your fields of interest). A super dense search without a reporting command in smart mode is going to perform very differently than that same command had you only added a reporting command like ... | stats count .
When you write a search and ask Splunk to execute it, the search head dispatches the job in parallel to all of your indexers. Each indexer allocates a minimum of one single-threaded search process for your search.
In batch mode - where possible, up to your search parallelization limit - additional single-threaded search processes will be started. The first key here is that batch mode is not guaranteed - Splunk knows that for certain types of searches the order the events get returned in does not matter. For example, a reporting search of | stats count by field1, field2, field3 has no strict ordering requirement because the stats command can count things without requiring a strict time ordering. But, if you leave off the stats command (or another reporting command) then Splunk realizes that you're piping a table of raw events to a user, who expects them to be sorted in a certain time order ... reducing the effectiveness of batch mode.
Field extractions happen on the indexer. No amount of field extraction will effect CPU usage on the search head dramatically. The search head sends its configuration bundle to the indexers via the bundle replication process, and the indexer processes use that configuration bundle as the configuration files for search processes launched in support of that search head.
All searches begin by converting your specified search into a "literal search" (see litsearch in the Job Inspector), along with a remoteSearch (the search to be run at the indexers) and a reportingSearch (if applicable). The search at the indexers begins generally with picking out "in scope" buckets by looking first at your selected indexes and selected time range. Any buckets that fall outside of the selected indexes or do not overlap the search time range are considered out of scope. In-scope buckets are then checked via the tsidx lexicon against the LISPY generated by Splunk against everything up to the first | character of the remoteSearch. (Ignoring some push-left activity in 6.5 that attempts to push terms as far left as possible). Those events that match the LISPY expression are seeked-to in the raw data and decompressed and then passed along for field extraction.
There's more to consider here, and part of your overall perceived problem may be the amount of time required to do field extraction. But, I think you need to have a more realistic search that does what your user is trying to actually do. Don't automatically assume that "simpler searches can be used to performance model more complex ones".
... View more