We use Splunk to monitor our LDAP Cluster which receives millions of requests per day. We use Splunk searches and Splunk dashboards to monitor the systems in real time, and to check historic events. The trouble is, these searches are very, very slow.
We use these searches and dashboards multiple times per day, and the slowness is frustrating. Is there any way to accelerate a dashboard that makes use of the transaction
parameter?
LDAP requires the use of Transactions, because one request will always span multiple lines, like this:
slapd[9876]: conn=123456 fd=48 ACCEPT from IP=192.168.1.100:38958 (IP=0.0.0.0:636)
slapd[9876]: conn=123456 fd=48 TLS established tls_ssf=128 ssf=128
slapd[9876]: conn=123456 op=0 BIND dn="" method=128
slapd[9876]: conn=123456 op=0 RESULT tag=97 err=0 text=
slapd[9876]: conn=123456 op=1 SRCH base="ou=Group,ou=system,ou=Host,o=ldapsvc,dc=example,dc=org" scope=2 deref=0 filter="(uid=stefanl)"
...
slapd[9876]: conn=123456 op=2 UNBIND
slapd[9876]: conn=123456 fd=48 closed
slapd[46834]: conn=123456 fd=48 closed
Therefore, any useful searches needs to use a transaction
, like this:
host=192.168.5.55/24 process=slapd | transaction conn startswith="ACCEPT from" endswith="closed" maxspan=10m
No, you can only accelerate a search if it uses "streaming commands" and "transforming commands." The transaction command does not fit in either category.
Read more in the docs here. @musskopf has a good suggestion about using summary indexes, but you can do other things as well:
One way to make the dashboards load faster is to schedule the searches to run in the background. Then, when the dashboard loads, it will pick up the most recent cached results. The resulting data will potentially be a little stale, but that may be fine depending on what the users need. (Example, schedule the searches to run once per hour; then the dashboard might be one hour old - at worst.)
Here is a different idea: you will find that searches using transaction may run dramatically faster if you
As an example of the second point: imagine that the user is only interested in the start/end of the LDAP transactions; not the details in between. Use this search:
host=192.168.5.55/24 (ACCEPT OR closed) | transaction conn startswith="ACCEPT from" endswith="closed" maxspan=10m
Finally, many searches can be run across multi-line events, without using transaction
at all. For example, if the user is interested only in "what is the duration of the LDAP transactions?", you can run
host=192.168.5.55/24 | stats range(_time) as duration by conn
This is not quite the same as using transaction, because this search might include LDAP transactions that have not yet completed. But with a little work and finesse, you can work out this problem as well. If you have millions of events, you will find this search amazingly fast compared to transaction
I hope this gives you some food for thought and some ideas to try.
No, you can only accelerate a search if it uses "streaming commands" and "transforming commands." The transaction command does not fit in either category.
Read more in the docs here. @musskopf has a good suggestion about using summary indexes, but you can do other things as well:
One way to make the dashboards load faster is to schedule the searches to run in the background. Then, when the dashboard loads, it will pick up the most recent cached results. The resulting data will potentially be a little stale, but that may be fine depending on what the users need. (Example, schedule the searches to run once per hour; then the dashboard might be one hour old - at worst.)
Here is a different idea: you will find that searches using transaction may run dramatically faster if you
As an example of the second point: imagine that the user is only interested in the start/end of the LDAP transactions; not the details in between. Use this search:
host=192.168.5.55/24 (ACCEPT OR closed) | transaction conn startswith="ACCEPT from" endswith="closed" maxspan=10m
Finally, many searches can be run across multi-line events, without using transaction
at all. For example, if the user is interested only in "what is the duration of the LDAP transactions?", you can run
host=192.168.5.55/24 | stats range(_time) as duration by conn
This is not quite the same as using transaction, because this search might include LDAP transactions that have not yet completed. But with a little work and finesse, you can work out this problem as well. If you have millions of events, you will find this search amazingly fast compared to transaction
I hope this gives you some food for thought and some ideas to try.
Thanks @Iguinn. One of the reasons I used the transaction
was because it kept the IP address in the data, and I use that later on. For example, this tells me the hosts which connect to our LDAP servers the most:
host=192.168.5.55/24 | transaction conn startswith="ACCEPT from" endswith="closed" maxspan=10m | top LDAP_SRC_IP
Can I do that with your second example using stats
?
However, we were hoping to adapt this so we can see find servers with long running queries, servers which are performing many activities within a session (All of which contain the same conn=1234
number).
host=192.168.5.55/24
| stats range(_time) as duration count as NumEntries first(LDAP_SRC_IP) as LDAP_SRC_IP by conn
OR
host=192.168.5.55/24
| stats first(LDAP_SRC_IP) as LDAP_SRC_IP by conn
| top LDAP_SRC_IP
are some searches that might be useful too
Second question: You mention ' schedule the searches to run in the background.' I would like to see these LDAP statistics for the last 4 hours, but I would also like the graph to update in real time afterwards. Can I do that if I schedule a search to run in the background? I want a single graph that can provide an instant snapshot of the recent and current health of the LDAP systems--- I want to show the last 4 hours of activity, and have the graphs be updated in real time.
If you use scheduled searches, you will not be able to get real-time updates. What you want is great, but if performance is really an issue, I would have separate panels in the dashboard: One set of historical data, based on scheduled searches,, and one set of real-time panels.
I would suggest you to go with summary indexes, where you can customize 100% the output and behavior. I never had good experience using accelerated search for something more complicated than search | stats count by group
.