I using the [network toolkit] application to monitor remote device,Set in Data inputs> ping
5 pings in 60 seconds,search field with Visualization,Below is my search code
index="pingstatus" "dest=192.168.0.210" | chart avg(packet_loss)
I add about 20 devices，But the dashboard shows no search results
No results found after a few minutes
Below is my time limit & refresh code
<search> <query>index="pingstatus" "dest=192.168.0.12" | chart avg(packet_loss)</query> <earliest>-60s</earliest> <latest>now</latest> <sampleRatio>1</sampleRatio> <refresh>30s</refresh> <refreshType>delay</refreshType> </search>
Warning message appears on the page
The instance is approaching the maximum number of historical searches that can be run concurrently.
And I check$SPLUNK_HOME/var/log/splunk/splunkd.log and there are warning messages
10-22-2020 14:18:56.436 +0800 WARN DispatchManager - The instance is approaching the maximum number of historical searches that can be run concurrently. 10-22-2020 14:19:31.376 +0800 WARN DispatchSearchMetadata - could not read metadata file: /opt/splunk/var/run/splunk/dispatch/admin__admin__search__search4_1603347571.892/metadata.csv
splunk version :splunk-enterprise 8.0.6
network toolkit version:1.4.3
Operating system : Ubuntu 18.0.4(64bit)
Physical Memory Capacity(MB):1489
Anyone can help me ? Or does anyone have other ideas for monitoring remote devices?
Thanks a lot!
From a Dashboard development perspective, I always consider the following: (this took me some time to learn to approach the problem in a way that is optimal for each use case.
1. What data points are needed for each panel? does the panel's require sources have anything in common with other panels to be developed?
2. What sample query works to get the data needed? How many panels are to be developed?
3. How many users does the splunk environment need to support? how it is to scale? how many cpu cores, memory, how many background searches? As with a finite amount of resources, any search dashboard panel is alwsys to compete with other searches and resources.
The case you have is you have 20 different searches (like 20 different people going to a well to get a sip of water, if you would find how many people want water then go to the well with one bucket the line would not be so long and bring back enough for everyone. This is the problem with queue searches. I learned this overtime as I build some dashboards;
The one thing I haven't confirmed works in this method; but should is the refresh. The main thing to consider is there has to be enough compute and CPU cores dedicated to run all the searches in the time window between the refresh. What is the impact if you have less refreshes and or search overlong time? Would you not be better off to search 5 / 15 minutes and do charts of any devices over some threshold on a chart. Are you going to make decision on data changing every 30 seconds. What value to be gleaned, a moment point in time, how are you to make a decision based on that date or insights gained? versus trying to create some eye candy?
At the end of the day; you are going to the same index to perform the same calculation on a bunch of devices.
I would start with establishing a base search right after the title row of the page; outside of row/panel structure of the xml;
Example use of base search; this is my goto method for dashboards with many panels to improve search queue wait times
<!-- your base search --> <search id="pingstatus> <query>index="pingstatus" | stats avg(packet_loss) as avgLoss by dest</query> <earliest>-60s</earliest> <latest>now</latest> <sampleRatio>1</sampleRatio> <refresh>30s</refresh> <refreshType>delay</refreshType> </search> <!-- in your device panel --> <search base="pingstatus"> <query> | search "dest=192.168.0.12" | table avgLoss</query> </query> </search>
Thank you for your quick reply to my question.
1.I think just need to "packet_loss" this data point for each panel.
And the panel's common sources is
"index="pingstatus" "dest=192.168.x.x" | chart avg(packet_loss)"
,Just hosts destination is different.
index = "pingstatus"
can get the data.
l need to monitor about 70 hosts.
3.Because I have about 70 hosts that need to be monitored, I don’t know which deployment architecture to use. Currently I use Single-Instance Deployment: All-in-One
Using a splunk-enterprise-addon app (Network-toolkit) , I don't know how many users are needed。Currently testing the splunk environment
It is a single-core cpu, 1.5G of memory, a dashboard requires a background search, there are already about 70
Because I want to monitor the normal operation of the equipment or server in real time, so you suggest what core CPU and how much memory I need. I want to know clearly which equipment is missing，Or is it that the refresh interval between my two times is too short to cause my performance consumption?Or is it that the refresh interval between my two times is too short to cause my performance consumption?
Or do you have other better ways to monitor these devices?
thank you very much