Getting Data In

Load difference Indexed real-time search versus real-time search

FritzWittwer_ol
Contributor

Has anyone real world experience on the difference in the load on a search head if a real time search is executed as Indexed real-time search.
Of course the number of possible parallel searchess should increase, but how much load do this searches generate.

0 Karma

lguinn2
Legend

Indexed real-time reads the disk instead of watching the data flow between indexing steps in memory. When a search is running though, it consumes a memory core, regardless of whether it is a real-time search or an indexed real-time search or a historical search. You can set different limits on the number of concurrent searches by type (in limits.conf) - but using indexed real-time does not inherently increase the number of possible parallel searches.

While running, an indexed real-time search performs like a historical (normal, not real-time) search at any point in time, in terms of memory use and cpu workload. The difference is that an indexed real-time search never finishes reading the data, as it is always waiting for more data to arrive on disk. However, because it uses the normal OS I/O mechanisms, an indexed realtime search can take advantage of disk caching, concurrent reads, etc., making it more efficient than a real-time search.

A real-time search monitors memory and tends to slow down the indexing of incoming data, which can lead to other problems.

However, these differences occur on the indexers - where the data is being read and indexed - and not really on the search heads (AFAIK). So you might think, "this isn't going to affect my search head at all" - but it does. All searches essentially compete for time on the indexers, so when a less efficient search runs (like any real-time search), it can affect the throughput of all searches running on the search head.

AFAIK, there are no hard numbers, as the cost/impact will depend on the workload mix (indexing vs. searching) on the indexers as well as the workload (number, size, and types of searches) running on the search head. I think the effects will be more obvious with a higher system load; for example, systems with unused cores might show little difference.

But even though there are no hard numbers, I think it means a lot when Splunk says "All real-time searches in Splunk Enterprise Security use the indexed real-time setting to improve indexing performance" in the Enterprise Security Installation and Upgrade Manual under Deployment Planning. But notice it says "indexing performance" not "search performance." I think your main performance improvement happens by not bogging down the indexers, which in turn will make the searches run faster on the search head.

Hope this helps! You might want to benchmark this a bit in your own environment.

I know this answer is not definitive and I would love to hear more from others...

printul77700
Explorer

closer and closer, as I am searching to understand the difference between real-time search and continuous on tsidx(using tstats on summariesonly=true, so basically on already indexed and accelersted data)
the terms themselves zi find very counterintuitive ...continuous and rela-time,but that is another topic.
back to some examples as they always help:
one rule looking for same event and if it happens x times in an interval of y minutes to have an alert;let’s call this a fail event so I need x fails in y minutes and no succes event inside this 5 minutes and between the fails.
let’s say I want to put as less pressure on system as possible and I look on acceler data model ;I want to be sure I loose no event from reading so then what should I choose?
1.a real-time search which let’s say looks back 1hour and runs every minute and I can set inside search count by _time span=5min
2.a continous search which looks back five minutes and runs every minute with the count by _time span=5min ;
advantage for 1, as I am looking as this is not critical app and I want to offer other searches more space and time ,is that if it can not run it has 59 possible fails/canceled runs and it still can see the events I described above
if I apply continuous I understand the run will not be ever csnceled so it will fight with other searches untill the moment when whole system can have delayed searches.

Very confusing over all, looking fwd for an answer
thank you

0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...