Monitoring Splunk

Expected performance?

reedmohn
Communicator

Is there any way to predetermine what performance to expect from Splunk?
We have just installed Splunk, distributed on two indexers, one of them is also a search head.
What search performance can I expect from the simplest of searches?

The servers are DL380's with dual 6 core processors, 12GB RAM, 2x146GB + 8x900GB (two RAID0+1 sets) disks. Currently they receive between 1m and 2m events per hour in total, and at the moment there is only one user.

I am still a noob here, so this might be a stupid question..
Anyway, I did a simple search from our Windows Event logs:

source="WinEventLog:Security" (host="server1" OR host="server2") | top EventCode

Search was for a timespan of one hour, field discovery is off.
Since volume varies with time, I did a few different hours.
A result that includes 750k matching events takes roughly two minutes. 1.4m events takes about four minutes to complete.

I have no idea whether that is fast or slow. Any tips?

(Note: I expect that this search takes a performance hit since EventCode is not an indexed field, I'm just curious as to whether the result is still realistic).

UPDATE: I see I was a little off in the RAID description, it's just two sets, the 146GB's are one RAID1 set, the 8 900GB's are one RAID0+1 set. The small disks are 15K, the larger 10K. All are SAS-drives.
Also, being all EventCodes of the Security logs that I am searching through, I am expecting it to return a result based on almost 100% of the data in that source. In total, the Security logs make up probably over 90% of all the data currently collected.

Tags (1)
1 Solution

reedmohn
Communicator

I just thought I'd close this off with mentioning that we've now done some more searching and testing. And while we haven't got the exact math to confirm that we are all moving at superspeed, we do see that the disks are performing within spec, and that searches are within what we reasonably expect.
Mind you, we almost only do pure searching (so fa), there is little calculation, statistics etc. so there are going to be a lot of slow searches grabbing a few thousand rows here and there, then narrowing down from there. (Ie. for troubleshooting purposes).

Thanks for the comments and inputs.

View solution in original post

0 Karma

reedmohn
Communicator

I just thought I'd close this off with mentioning that we've now done some more searching and testing. And while we haven't got the exact math to confirm that we are all moving at superspeed, we do see that the disks are performing within spec, and that searches are within what we reasonably expect.
Mind you, we almost only do pure searching (so fa), there is little calculation, statistics etc. so there are going to be a lot of slow searches grabbing a few thousand rows here and there, then narrowing down from there. (Ie. for troubleshooting purposes).

Thanks for the comments and inputs.

View solution in original post

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Well, the answer to any performance question is often "it depends". But, I would make some comments and ask some additional questions.

What RPM and physical interface are those drives?

How many random IOPS can that I/O subsystem do? (Use bonnie++ as a test)

Where are your hot buckets being written? To the (2x146, or the 8x900)?

If these are 2.5" 15K SAS drives (equivalent to a Seagate Savvio 15K.3), you can expect (according to the calculations) around 206 IOPS per drive - and therefore a little over 800 IOPS for the 8x900 array. This is within recommendations for hot buckets. If they're being written to the 2x146, though, it is highly unlikely that they could provide enough IOPS to have indexing and search both perform acceptably.

I derived a theoretical 206 IOPS per drive based on the formula 1 / ( (avg-read-seek + avg-write-seek)/2 + avg-latency ). For the Savvio this works out to be 1 / ( ( .0026 + .0031 ) / 2 + .002 ) or 206.1 (See http://www.seagate.com/files/www-content/product-content/savvio-fam/savvio-15k/savvio-15k-3/en-us/do... )

As far as your search, the "| top EventCode" does not play into indexed versus not indexed. The base search "source="WinEventLog:Security" (host="server1" OR host="server2")" runs and returns all of the events matching those criteria, which are then post-filtered by top.

More RAM can never hurt. Splunk has no internal caching mechanism (like Oracle / SQL Server / DB2 and their large internal page buffers ) but it can take advantage of file system caching in the operating system to improve I/O throughput.

If you can't work your way out of this problem differently, then perhaps summary indexing will help you. You could build a summary index of this data and be able to use it to improve search response time for targeted searches. This is useful for highly responsive dashboards, but is much less useful for optimizing ad-hoc searches.

FYI, you are able to edit your original question to add in any additional information.

reedmohn
Communicator

Thanks. I'm waiting for some performance tests. The 8 data drives are all in one set (see update above). From the info we got when purchasing, we can expect to exceed 800 IOPS, but I'll recheck that with our vendor.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

that's seems like pretty good performance. but the number of results per second/per minute does depend not just on the total number of results, but also the density of the results w/in the index. a good benchmark top speed for your hardware is about 1 to 2 million results per minute per indexer for the "dense" type of searches that fetch back 100% of the results in an index, e.g., if you search for "*" (or if you do "* | top EventCode" -- the top shouldn't make a significant difference.) if your query above references about 30% or less of your total data, that indicates you are at this level of performance or better.

however, it should be noted that for the most part, Splunk performance isn't linearly determined by how many results come back, but rather by how much data is in the time range and within the indexes you are searching over. If your query returns more than about 1/1000 to 1/2000 of the data in the range/index then it should be able to cover around 1 to 2 million events/minute. But if you search for more rare items, such that you return less than 1 in 2000 items in the time range (+index), then you can expect it to be much faster: from 1 million to 100 million events per second or better (based on the total number of events in the searched time range). (And if you search for even more rare items, ones that occur less than 1 in 100,000,000 in the time range, you can expect it to cover about 1 billion to 100 billion events/second or better -- but you need a lot of data for that to show.)

of course all the above numbers depend on hardware and system CPU and IO load.

turning on field discovery would slow it down quite a bit, but the timeline display also slows it down. If you run the same search either at the command line or using the "Advanced Charting" view in the UI, it should be a bit faster still.

having an indexed field for EventCode would make no difference in your results, and generally indexed fields do not help performance in Splunk except in specific unusual circumstances. (the reason is that everything in Splunk is already indexed.)

reedmohn
Communicator

Thanks for the feedback. This data is pretty much all of that index. All Windows EventLog messages have an EventCode, so even if I haven't checked, I am confident that the result should normally include 100% of the data in the index. So, if 1-2 mill per minute is expected, I'm a little behind...

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!