I have noticed that when users leave browsers open on the summary dashboard of the search app (http://host.example.com:8000/en-US/app/search/dashboard_live
), the server where Splunk is installed eventually runs out of memory!
What is going on here? How can this be prevented?
In Splunk 4.2.4, 4.2.5 and 4.3 there is a memory leak affecting real-time metadata searches such as those that the search app's summary dashboard runs. This problem has been filed as bug SPL-45901 and will be fixed in release 4.3.1. This bug also affects historical metadata searches, but those tend to have a lesser impact as they are ephemeral and run in-process (in the context of the main splunkd process).
If you want to check that this is what's happening to you, you can install the Splunk on Splunk app on the affected instance, turn on the ps_sos.sh
scripted input and use the "Splunk CPU/Memory Usage" view after an occurrence of this problem to correlate high memory usage with Splunk searches.
If you find that any of the following searches are consuming high amounts of memory (tyically, several GBs) and growing linearly, then you are hitting SPL-45901 :
metadata type=sources | search totalCount>0 | rename totalCount as Count recentTime as "Last Update" | table source Count "Last Update" | fieldformat Count=tostring(Count, "commas") | fieldformat "Last Update"=strftime('Last Update', "%c")
metadata type=hosts | search totalCount>0 | rename totalCount as Count recentTime as "Last Update" | table host Count "Last Update" | fieldformat Count=tostring(Count, "commas") | fieldformat "Last Update"=strftime('Last Update', "%c")
metadata type=sourcetypes | search totalCount>0 | rename totalCount as Count recentTime as "Last Update"
As the baseline memory consumption is proportional to the cardinality of the metadata field queried, it's typically the first of these 3 searches (for metadata type=sources
) that eats up the most memory and causes the biggest problem.
Upgrade Splunk to version 4.3.1.
If you are unable to install Splunk 4.3.1, the recommended work-around is to prevent this from happening by modifying a local copy of the search app's nav bar.
$SPLUNK_HOME/etc/apps/search/default/data/ui/nav/default.xml
to $SPLUNK_HOME/etc/apps/search/local/data/ui/nav/
. Create the target directory if it doesn't exist.Edit $SPLUNK_HOME/etc/apps/search/local/data/ui/nav/default.xml
to make the flashtimeline the default view :
(...)
In addition, you can delete the following line if you want to remove the link to the live summary dashboard link from the nav bar to ensure that no one will trigger the problem by manually accessing that view.
(...)
(...)
...or you could modify it so that the link points to the pre-4.2 dashboard that only ran historical metadata searches :
(...)
(...)
$SPLUNK_HOME/etc/apps/search/local/data/ui/nav/default.xml
when you will install 4.3.1 in order to back out this work-around!An alternate approach (can be used in conjunction) is to use unix ulimit to force splunk searches over a certain size to fail. For example, prior to starting splunk on a search head:
# limit vsize to approximately 1.5GB
ulimit -v 1572864
(this is the builtin ulimit for bash, bourne users on older unixes may need alternate syntax).
Local usage patterns may have legitimate use patterns that require a higher value, so adjust as needed, especially after checking your main splunkd using top, ps, or sother similar tools.
This will force the runaway searches to be exited if they grow larger than 1.5GB. Typically sources is the largest by far, and will hit this ceiling and exit before the host or source searches grow terribly large, putting an expected ceiling of this cost somewhere around 3GB per user hanging out viewing the summary dashboard.
On a standalone indexer, this value will likely have to be set higher, for splunk-optimize. 2GB is probably enough, but I've not tested at this time :-(.
If your operating system has an /etc/security/limits.conf file and uses that exact syntax. Please check local documentation to confirm.
To make this a permanent change edit your /etc/security/limits.conf file and append the line
splunk - as 1572864
to the config file. This will set both the hard and soft limit for the splunk user.
In Splunk 4.2.4, 4.2.5 and 4.3 there is a memory leak affecting real-time metadata searches such as those that the search app's summary dashboard runs. This problem has been filed as bug SPL-45901 and will be fixed in release 4.3.1. This bug also affects historical metadata searches, but those tend to have a lesser impact as they are ephemeral and run in-process (in the context of the main splunkd process).
If you want to check that this is what's happening to you, you can install the Splunk on Splunk app on the affected instance, turn on the ps_sos.sh
scripted input and use the "Splunk CPU/Memory Usage" view after an occurrence of this problem to correlate high memory usage with Splunk searches.
If you find that any of the following searches are consuming high amounts of memory (tyically, several GBs) and growing linearly, then you are hitting SPL-45901 :
metadata type=sources | search totalCount>0 | rename totalCount as Count recentTime as "Last Update" | table source Count "Last Update" | fieldformat Count=tostring(Count, "commas") | fieldformat "Last Update"=strftime('Last Update', "%c")
metadata type=hosts | search totalCount>0 | rename totalCount as Count recentTime as "Last Update" | table host Count "Last Update" | fieldformat Count=tostring(Count, "commas") | fieldformat "Last Update"=strftime('Last Update', "%c")
metadata type=sourcetypes | search totalCount>0 | rename totalCount as Count recentTime as "Last Update"
As the baseline memory consumption is proportional to the cardinality of the metadata field queried, it's typically the first of these 3 searches (for metadata type=sources
) that eats up the most memory and causes the biggest problem.
Upgrade Splunk to version 4.3.1.
If you are unable to install Splunk 4.3.1, the recommended work-around is to prevent this from happening by modifying a local copy of the search app's nav bar.
$SPLUNK_HOME/etc/apps/search/default/data/ui/nav/default.xml
to $SPLUNK_HOME/etc/apps/search/local/data/ui/nav/
. Create the target directory if it doesn't exist.Edit $SPLUNK_HOME/etc/apps/search/local/data/ui/nav/default.xml
to make the flashtimeline the default view :
(...)
In addition, you can delete the following line if you want to remove the link to the live summary dashboard link from the nav bar to ensure that no one will trigger the problem by manually accessing that view.
(...)
(...)
...or you could modify it so that the link points to the pre-4.2 dashboard that only ran historical metadata searches :
(...)
(...)
$SPLUNK_HOME/etc/apps/search/local/data/ui/nav/default.xml
when you will install 4.3.1 in order to back out this work-around!I see something like this with 5.0.4, build 172409. Same fix? My specifics at:
http://answers.splunk.com/answers/103589/search-summary-page-running-real-time-searches
I agree with jrodman about making flashtimeline the default view. When you are new to Splunk, the summary dashboard is helpful for understanding the available sources, etc. But experienced users quit looking at the dashboard, and the dashboard load time is not worth it, especially on a busy system.
Making flashtimeline the default view is probably the best place to start. The summary page is typically of interest to splunk administrators more than splunk users.
Thanks for the write-up on this hexx!