I am working with access_combined_wcookie data (essentially Nginx log files) in Splunk. An example of a record is below:
5/25/14 2:44:08.000 AM xxx.xxx.xxx.xxx - - [25/May/2014:02:44:08 -0500] "GET /somepath/ HTTP/1.1" 200 9696 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +bingbot.htm)" "-"
I'm specifically interested in being able to run queries on the GET uri. So, that I can issue the following for instance:
/somepath/ | chart count by date_mday
I have now loaded over 60M records into Splunk and keep adding more each day.
I see that sometimes Splunk automatically indexes the "/somepath/" value and when I enter the query I get an immediate answer. But when enter the following query:
/some-path/ | chart count by date_mday
(essentially the same as above, but with a hyphen or dash), I have to wait a while for Splunk to generate the results. It seems that the outcome is volume-related: on smaller sets I get the result immediately, almost as if it were cached, where as the larger result sets take a long time (disproportionally so).
Is there any way for me to control that behavior? Is the smaller volume results get cached in memory and thus adding more RAM to the machine (VM in this case) would help get faster results?
... View more