<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: slow &amp;quot;command.search.kv&amp;quot; phase in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200352#M39588</link>
    <description>&lt;P&gt;Problem is this index is 80 times larger than any other on this indexer ..&lt;BR /&gt;
Anyway, a search will take 10 seconds on this index and 1 second on another&lt;BR /&gt;
index="foo" | head 10000&lt;/P&gt;

&lt;P&gt;PerformanceTroubleshooting: I saw it; nothing I can detect on systems check (not a linux superstar..) &lt;BR /&gt;
I have an anti-pattern on an eventtype (too many NOTs), but I want to be sure there is no other way before going to index-time field extraction&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[slm_request]
color =
description =
disabled = 0
priority = 1
search = sourcetype="web_access_log:foo" NOT ( cs_uri_stem="*.css*" OR cs_uri_stem="*.gif*" OR cs_uri_stem="*.ico*" OR cs_uri_stem="*.jpg*" OR cs_uri_stem="*.js" OR cs_uri_stem="*.js?*" OR cs_uri_stem="*.png*")
tags =
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I didn't set a transforms.conf on this app&lt;BR /&gt;
props list :&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[web_access_log:foo]
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE = 
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
EVAL-cs_hostname = if(isnull(cs_hostname),host,cs_hostname)
EVAL-cs_uri_stem = cs_uri_query
EVAL-final_time_taken = final_time_taken/1000000
EXTRACT-cs_hostname = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s+(?:\d+)\s+"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?P&amp;lt;cs_hostname&amp;gt;[^\/]*)\/
EXTRACT-cs_uri_query = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s+(?:\d+)\s+"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?:[^\/]*)(?P&amp;lt;cs_uri_query&amp;gt;[^ ]+)\s
EXTRACT-fields = ^(?P&amp;lt;c_ip&amp;gt;[^ ]+)[^\[\n]*\[(?P&amp;lt;date&amp;gt;[^:]+):(?P&amp;lt;time&amp;gt;[^ ]+)[^ \n]* (?P&amp;lt;timezone&amp;gt;\+\d+)\]\s+(?P&amp;lt;final_time_taken&amp;gt;\d+)\s+"(?P&amp;lt;cs_method&amp;gt;\w+)[^ \n]* (?P&amp;lt;cs_uri&amp;gt;[^ ]+)\s+(?P&amp;lt;protocol&amp;gt;[^"]+)[^ \n]* (?P&amp;lt;sc_status&amp;gt;\d+)\s+(?P&amp;lt;sc_bytes&amp;gt;\d+)\s+"(?P&amp;lt;cs_referer&amp;gt;[^"]+)"\s+"(?P&amp;lt;cs_useragent&amp;gt;[^"]+)
HEADER_MODE = 
KV_MODE = none
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 31
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 50
MUST_BREAK_AFTER = 
MUST_NOT_BREAK_AFTER = 
MUST_NOT_BREAK_BEFORE = 
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TIME_FORMAT = 
TIME_PREFIX = \[
TRANSFORMS = 
TRUNCATE = 10000
category = app
description = foo
detect_trailing_nulls = false
maxDist = 100
priority = 
sourcetype = 
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Thu, 28 Jan 2016 16:50:26 GMT</pubDate>
    <dc:creator>lraynal</dc:creator>
    <dc:date>2016-01-28T16:50:26Z</dc:date>
    <item>
      <title>slow "command.search.kv" phase</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200350#M39586</link>
      <description>&lt;P&gt;I have slow searches on one particular index, which is receiving apache access.log files. &lt;BR /&gt;
When I inspect my jobs, I see a very long "command.search.kv" phase. &lt;BR /&gt;
I guess I made a rookie mistake on the regular expressions.&lt;BR /&gt;
The log format is set by others, I can't change it. It contains stuff like :&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1.2.3.4 - - [23/Dec/2015:14:44:33 +0100] "GET &lt;A href="http://1.2.3.4/ABCDEF/Pop.do" target="test_blank"&gt;http://1.2.3.4/ABCDEF/Pop.do&lt;/A&gt; HTTP/1.0" 200 886 16352 "-" "check_http/v1.4.15 (nagios-plugins 1.4.15)"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;or &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1.2.3.4 - - [23/Dec/2015:14:54:08 +0100] "GET /ABCDEF/Pop.do HTTP/1.1" 200 10738 18287 "http://172.18.56.35/ABCDEF/Map.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;(notice sometimes complete hostname, sometimes not)&lt;/P&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[web_access_log:myapp]
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 50
TIME_PREFIX = \[
TIME_FORMAT =
KV_MODE = none
category = app
EXTRACT-fields = ^(?P&amp;lt;c_ip&amp;gt;[^ ]+)[^\[\n]*\[(?P&amp;lt;date&amp;gt;[^:]+):(?P&amp;lt;time&amp;gt;[^ ]+)[^ \n]* (?P&amp;lt;timezone&amp;gt;\+\d+)\]\s+"(?P&amp;lt;cs_method&amp;gt;\w+)[^ \n]* (?P&amp;lt;cs_uri&amp;gt;[^ ]+)\s+(?P&amp;lt;protocol&amp;gt;[^"]+)[^ \n]* (?P&amp;lt;sc_status&amp;gt;\d+)\s+(?P&amp;lt;sc_bytes&amp;gt;\d+)\s+(?P&amp;lt;final_time_taken&amp;gt;\d+)\s+"(?P&amp;lt;cs_referer&amp;gt;[^"]+)"\s+"(?P&amp;lt;cs_useragent&amp;gt;[^"]+)
EXTRACT-cs_uri_query = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s++"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?:[^\/]*)(?P&amp;lt;cs_uri_query&amp;gt;[^ ]+)\s
EXTRACT-cs_hostname = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s++"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?:[^\/]*)\/(?P&amp;lt;cs_hostname&amp;gt;\w*)\/
EVAL-cs_hostname = if(isnull(cs_hostname),host,cs_hostname)
EVAL-final_time_taken = final_time_taken/1000000
EVAL-cs_uri_stem = cs_uri_query
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 23 Dec 2015 14:05:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200350#M39586</guid>
      <dc:creator>lraynal</dc:creator>
      <dc:date>2015-12-23T14:05:13Z</dc:date>
    </item>
    <item>
      <title>Re: slow "command.search.kv" phase</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200351#M39587</link>
      <description>&lt;P&gt;Hello Laurent can you provide a complete props list? (Cli : SPLUNK btool props list) as well as transforms?&lt;BR /&gt;
Can you confirm its faster with other source type on this same indexer? (See jobs running time through rest)&lt;BR /&gt;
I would specify time format.&lt;BR /&gt;
Also have you read this ?&lt;BR /&gt;
&lt;A href="https://wiki.splunk.com/Community:PerformanceTroubleshooting"&gt;https://wiki.splunk.com/Community:PerformanceTroubleshooting&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 06:39:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200351#M39587</guid>
      <dc:creator>sduchene_splunk</dc:creator>
      <dc:date>2016-01-19T06:39:39Z</dc:date>
    </item>
    <item>
      <title>Re: slow "command.search.kv" phase</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200352#M39588</link>
      <description>&lt;P&gt;Problem is this index is 80 times larger than any other on this indexer ..&lt;BR /&gt;
Anyway, a search will take 10 seconds on this index and 1 second on another&lt;BR /&gt;
index="foo" | head 10000&lt;/P&gt;

&lt;P&gt;PerformanceTroubleshooting: I saw it; nothing I can detect on systems check (not a linux superstar..) &lt;BR /&gt;
I have an anti-pattern on an eventtype (too many NOTs), but I want to be sure there is no other way before going to index-time field extraction&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[slm_request]
color =
description =
disabled = 0
priority = 1
search = sourcetype="web_access_log:foo" NOT ( cs_uri_stem="*.css*" OR cs_uri_stem="*.gif*" OR cs_uri_stem="*.ico*" OR cs_uri_stem="*.jpg*" OR cs_uri_stem="*.js" OR cs_uri_stem="*.js?*" OR cs_uri_stem="*.png*")
tags =
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I didn't set a transforms.conf on this app&lt;BR /&gt;
props list :&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[web_access_log:foo]
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE = 
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
EVAL-cs_hostname = if(isnull(cs_hostname),host,cs_hostname)
EVAL-cs_uri_stem = cs_uri_query
EVAL-final_time_taken = final_time_taken/1000000
EXTRACT-cs_hostname = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s+(?:\d+)\s+"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?P&amp;lt;cs_hostname&amp;gt;[^\/]*)\/
EXTRACT-cs_uri_query = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s+(?:\d+)\s+"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?:[^\/]*)(?P&amp;lt;cs_uri_query&amp;gt;[^ ]+)\s
EXTRACT-fields = ^(?P&amp;lt;c_ip&amp;gt;[^ ]+)[^\[\n]*\[(?P&amp;lt;date&amp;gt;[^:]+):(?P&amp;lt;time&amp;gt;[^ ]+)[^ \n]* (?P&amp;lt;timezone&amp;gt;\+\d+)\]\s+(?P&amp;lt;final_time_taken&amp;gt;\d+)\s+"(?P&amp;lt;cs_method&amp;gt;\w+)[^ \n]* (?P&amp;lt;cs_uri&amp;gt;[^ ]+)\s+(?P&amp;lt;protocol&amp;gt;[^"]+)[^ \n]* (?P&amp;lt;sc_status&amp;gt;\d+)\s+(?P&amp;lt;sc_bytes&amp;gt;\d+)\s+"(?P&amp;lt;cs_referer&amp;gt;[^"]+)"\s+"(?P&amp;lt;cs_useragent&amp;gt;[^"]+)
HEADER_MODE = 
KV_MODE = none
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 31
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 50
MUST_BREAK_AFTER = 
MUST_NOT_BREAK_AFTER = 
MUST_NOT_BREAK_BEFORE = 
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TIME_FORMAT = 
TIME_PREFIX = \[
TRANSFORMS = 
TRUNCATE = 10000
category = app
description = foo
detect_trailing_nulls = false
maxDist = 100
priority = 
sourcetype = 
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 28 Jan 2016 16:50:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200352#M39588</guid>
      <dc:creator>lraynal</dc:creator>
      <dc:date>2016-01-28T16:50:26Z</dc:date>
    </item>
    <item>
      <title>Re: slow "command.search.kv" phase</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200353#M39589</link>
      <description>&lt;P&gt;I set the TIME_FORMAT in props.conf, no noticeable difference in performance.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jan 2016 16:50:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200353#M39589</guid>
      <dc:creator>lraynal</dc:creator>
      <dc:date>2016-01-28T16:50:37Z</dc:date>
    </item>
    <item>
      <title>Re: slow "command.search.kv" phase</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200354#M39590</link>
      <description>&lt;P&gt;Why not creating an accelerated datamodel, and use this datamodel in your search ? this would solve your performance issue. Can you gives this idea a try ?&lt;/P&gt;

&lt;P&gt;I'm sure you know how to do this. for readers who don't : &lt;BR /&gt;
&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatamodels"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatamodels&lt;/A&gt; &lt;/P&gt;

&lt;P&gt;and then accelerate it : &lt;BR /&gt;
&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Acceleratedatamodels"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Acceleratedatamodels&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Feb 2016 09:31:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200354#M39590</guid>
      <dc:creator>sduchene_splunk</dc:creator>
      <dc:date>2016-02-01T09:31:31Z</dc:date>
    </item>
    <item>
      <title>Re: slow "command.search.kv" phase</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200355#M39591</link>
      <description>&lt;P&gt;I ended up doing index-time field extraction, for the field used in the event type I was querying a lot in my reports.&lt;BR /&gt;
The transition is a bit of a pain to handle when you have a lot of data in your index already, but it seems easier than using a datamodel in this case.. &lt;/P&gt;

&lt;P&gt;transforms.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[web_access_log_asset]
REGEX = (?:GET|POST)\s(?:[^ ]*(?&amp;lt;asset&amp;gt;\.css|\.gif|\.jpg|\.js[^p]|\.ico|\.css|\.png))
FORMAT = asset::$1
WRITE_META = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;fields.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[asset]
INDEXED=true
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 12 Feb 2016 10:21:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/slow-quot-command-search-kv-quot-phase/m-p/200355#M39591</guid>
      <dc:creator>lraynal</dc:creator>
      <dc:date>2016-02-12T10:21:49Z</dc:date>
    </item>
  </channel>
</rss>

