Splunk deployment -adhoc query have been slow for the past months.
1- We upgraded our system from 2 core to 12 core on a single server.
2- We upgraded from Splunk 5 to 6 (not a fresh install)
Now system has 16GB of RAM and Disk is 84% full.
I have followed the monitoring advice:
http://wiki.splunk.com/Community:PerformanceTroubleshooting
There is no IO bottleneck. When queries are run, there is only a sporadic spike of activity (iotop).
While the search is running there is one process at 100% the whole time.
I query the _internal for cpu usage,
Splunk > index=_internal source=*metrics.log group=pipeline | timechart sum(cpu_seconds) by name
the index is spiking at 5.455 in some rare occasions, all search are below 1. Whatever that means. The link above mention abnormal usage when the indice is over 30.
Memory was at constant 81% usage on the box, After restarting splunk, it dropped to 15%, but performance remained the same
To test created a brand new index.
ingested 745 log4j events.
There is no data model (I later setup a data model, and accessed it through Pivot, but it was slow too)
Basic default setting
Performed a very simple:
index=”my_test_index” | head 1
ran the search from the command line.
It takes 2 minutes 14 seconds to return the query above
On the same box, other indexes totaling 360 M events (few GB of data), these are slow too.
Bottom line, it is constantly slow, every query.
The job inspector is telling it spent 96% (120s) of the time doing Dispatch.evaluate.search,
all other categories are below 1 sec, most under 0.5 seconds.
I just found what was the problem...
Found the issue to be with bloated props.conf. at $SPLUNk_HOME/etc/apps/learned/local/ was containing 50000+ sourcetypes.
It was caused by having source type set to automatic on one of our inputs. For each csv file that got indexed a new source type was created leading to a bloated props.conf. I also got rid of unused entries in transforms.conf.
So it looks like the parsing job phase look into props.conf/transforms.conf before starting to search.
It explains why the "Parsing Job..." message would stay on for 2 minutes before having the search done in 1 second.
I just found what was the problem...
Found the issue to be with bloated props.conf. at $SPLUNk_HOME/etc/apps/learned/local/ was containing 50000+ sourcetypes.
It was caused by having source type set to automatic on one of our inputs. For each csv file that got indexed a new source type was created leading to a bloated props.conf. I also got rid of unused entries in transforms.conf.
So it looks like the parsing job phase look into props.conf/transforms.conf before starting to search.
It explains why the "Parsing Job..." message would stay on for 2 minutes before having the search done in 1 second.
Hello ,
We are facing the same issue now and may you advise how to resolve it? Thanks.
The problem is solved by an input whose sourcetype is set to automatic, most likely from a file or directory input.
http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Bypassautomaticsourcetypeassignment
However, in 6.2+ I am not seeing a way to set sourcetype to automatic when setting up new data inputs. Best thing I can suggest is too do two things:
This was successful for me today. Hope this helps - Good Luck!