Re: slow "command.search.kv" phase

lraynal · ‎12-23-2015

I have slow searches on one particular index, which is receiving apache access.log files.
When I inspect my jobs, I see a very long "command.search.kv" phase.
I guess I made a rookie mistake on the regular expressions.
The log format is set by others, I can't change it. It contains stuff like :

1.2.3.4 - - [23/Dec/2015:14:44:33 +0100] "GET http://1.2.3.4/ABCDEF/Pop.do HTTP/1.0" 200 886 16352 "-" "check_http/v1.4.15 (nagios-plugins 1.4.15)"

or

1.2.3.4 - - [23/Dec/2015:14:54:08 +0100] "GET /ABCDEF/Pop.do HTTP/1.1" 200 10738 18287 "http://172.18.56.35/ABCDEF/Map.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"

(notice sometimes complete hostname, sometimes not)

props.conf

[web_access_log:myapp]
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 50
TIME_PREFIX = \[
TIME_FORMAT =
KV_MODE = none
category = app
EXTRACT-fields = ^(?P<c_ip>[^ ]+)[^\[\n]*\[(?P<date>[^:]+):(?P<time>[^ ]+)[^ \n]* (?P<timezone>\+\d+)\]\s+"(?P<cs_method>\w+)[^ \n]* (?P<cs_uri>[^ ]+)\s+(?P<protocol>[^"]+)[^ \n]* (?P<sc_status>\d+)\s+(?P<sc_bytes>\d+)\s+(?P<final_time_taken>\d+)\s+"(?P<cs_referer>[^"]+)"\s+"(?P<cs_useragent>[^"]+)
EXTRACT-cs_uri_query = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s++"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?:[^\/]*)(?P<cs_uri_query>[^ ]+)\s
EXTRACT-cs_hostname = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s++"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?:[^\/]*)\/(?P<cs_hostname>\w*)\/
EVAL-cs_hostname = if(isnull(cs_hostname),host,cs_hostname)
EVAL-final_time_taken = final_time_taken/1000000
EVAL-cs_uri_stem = cs_uri_query

lraynal · ‎02-12-2016

I ended up doing index-time field extraction, for the field used in the event type I was querying a lot in my reports.
The transition is a bit of a pain to handle when you have a lot of data in your index already, but it seems easier than using a datamodel in this case..

transforms.conf:

[web_access_log_asset]
REGEX = (?:GET|POST)\s(?:[^ ]*(?<asset>\.css|\.gif|\.jpg|\.js[^p]|\.ico|\.css|\.png))
FORMAT = asset::$1
WRITE_META = true

fields.conf:

[asset]
INDEXED=true

sduchene_splunk · ‎02-01-2016

Why not creating an accelerated datamodel, and use this datamodel in your search ? this would solve your performance issue. Can you gives this idea a try ?

I'm sure you know how to do this. for readers who don't :
http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatamodels

and then accelerate it :
http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Acceleratedatamodels

lraynal · ‎01-28-2016

Problem is this index is 80 times larger than any other on this indexer ..
Anyway, a search will take 10 seconds on this index and 1 second on another
index="foo" | head 10000

PerformanceTroubleshooting: I saw it; nothing I can detect on systems check (not a linux superstar..)
I have an anti-pattern on an eventtype (too many NOTs), but I want to be sure there is no other way before going to index-time field extraction

[slm_request]
color =
description =
disabled = 0
priority = 1
search = sourcetype="web_access_log:foo" NOT ( cs_uri_stem="*.css*" OR cs_uri_stem="*.gif*" OR cs_uri_stem="*.ico*" OR cs_uri_stem="*.jpg*" OR cs_uri_stem="*.js" OR cs_uri_stem="*.js?*" OR cs_uri_stem="*.png*")
tags =

I didn't set a transforms.conf on this app
props list :

[web_access_log:foo]
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE = 
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
EVAL-cs_hostname = if(isnull(cs_hostname),host,cs_hostname)
EVAL-cs_uri_stem = cs_uri_query
EVAL-final_time_taken = final_time_taken/1000000
EXTRACT-cs_hostname = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s+(?:\d+)\s+"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?P<cs_hostname>[^\/]*)\/
EXTRACT-cs_uri_query = ^(?:[^ ]+)[^\[\n]*\[(?:[^:]+):(?:[^ ]+)[^ \n]* (?:\+\d+)\]\s+(?:\d+)\s+"(?:\w+)[^ \n]* (?:[^\/]*)(?:\/{2})?(?:[^\/]*)(?P<cs_uri_query>[^ ]+)\s
EXTRACT-fields = ^(?P<c_ip>[^ ]+)[^\[\n]*\[(?P<date>[^:]+):(?P<time>[^ ]+)[^ \n]* (?P<timezone>\+\d+)\]\s+(?P<final_time_taken>\d+)\s+"(?P<cs_method>\w+)[^ \n]* (?P<cs_uri>[^ ]+)\s+(?P<protocol>[^"]+)[^ \n]* (?P<sc_status>\d+)\s+(?P<sc_bytes>\d+)\s+"(?P<cs_referer>[^"]+)"\s+"(?P<cs_useragent>[^"]+)
HEADER_MODE = 
KV_MODE = none
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 31
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 50
MUST_BREAK_AFTER = 
MUST_NOT_BREAK_AFTER = 
MUST_NOT_BREAK_BEFORE = 
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TIME_FORMAT = 
TIME_PREFIX = \[
TRANSFORMS = 
TRUNCATE = 10000
category = app
description = foo
detect_trailing_nulls = false
maxDist = 100
priority = 
sourcetype =

lraynal · ‎01-28-2016

I set the TIME_FORMAT in props.conf, no noticeable difference in performance.

sduchene_splunk · ‎01-18-2016

Hello Laurent can you provide a complete props list? (Cli : SPLUNK btool props list) as well as transforms?
Can you confirm its faster with other source type on this same indexer? (See jobs running time through rest)
I would specify time format.
Also have you read this ?
https://wiki.splunk.com/Community:PerformanceTroubleshooting

slow "command.search.kv" phase

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?