Hi, Splunkers
We have a single instance as an Indexer, Search head, and Splunk Enterprise Security (32Gb RAM,16 vCPU). I know this is not good practice and we're planning to implement more instances and separate roles as we're moving ahead and index more data.
From the beginning, we were focusing on particular cases and created a new index for each sourcetype. Now with new reqs and searches (incl. ES) built over tags, we're having really sustained searches.
The question: What approach is more correct to handle lots of distinct types of data, especially for Enterprise Security concerning writing data to index?
Is it important for ES how many indexes it's looking in?
Maybe someone has or had a similar situation and can point on bottlenecks or advise something?
Thank you in advance.
Theoretically, the more indexes you have, the more files are available for reading, the more disk bandwidth you'll possibly achieve.
The first issue is that linux has a default ulimit which is the number of files that can be open, but you can modify this setting rather easily.
The next issue is that you have to type some really large searches, or do the tagging, or perhaps search macros. That can get complicated on the development side.
Generally, and as a best practice, I put all data from common devices into the same index, separated by sourcetype and source. So for example I would put juniper and cisco firewall logs into one index called firewall_logs, and then each would have it's own sourcetype of juniper or cisco, and the each would have its own source of juniper1, juniper2, cisco_primary, cisco_core, etc.
This way when I'm correlating data, the base search is index=firewall_logs instead of something crazy like index=juniperSSG5 AND index=juniperSSG20 AND index=ciscoXXXX
Theoretically, the more indexes you have, the more files are available for reading, the more disk bandwidth you'll possibly achieve.
The first issue is that linux has a default ulimit which is the number of files that can be open, but you can modify this setting rather easily.
The next issue is that you have to type some really large searches, or do the tagging, or perhaps search macros. That can get complicated on the development side.
Generally, and as a best practice, I put all data from common devices into the same index, separated by sourcetype and source. So for example I would put juniper and cisco firewall logs into one index called firewall_logs, and then each would have it's own sourcetype of juniper or cisco, and the each would have its own source of juniper1, juniper2, cisco_primary, cisco_core, etc.
This way when I'm correlating data, the base search is index=firewall_logs instead of something crazy like index=juniperSSG5 AND index=juniperSSG20 AND index=ciscoXXXX
Actually we also keep common things together in dedicated indexes.
Yet searches by single macro authentication
or communicate
(or tag=communicate) may run up to 4 minutes for an 1 hour period (over 500k events in 20 indexes), I'm afraid this is too long even for configuration I've described.
It would be great if you could provide more or less aproximate comparison from you side.
Depending on how you've tagged it might be a very sparse search if searching over 20 indexes. However 500k events should return in a matter of seconds unless you mean the search is matching 500k out of 500billion events in the indexes. Still 4 minutes isn't terrible if you have that kind of volume. Perhaps you need more indexers or some summarization. Have you tried using Bonnie++ to get performance metrics of your disk subsystem? Can you do that and post the results?
Sorry, I can not post results as I had online session with tech guys and no performance issue with disk system was discovered, instead we had constant CPU overload, may it be a paramount reason of this problem?
Also I see default tagging (from some Add-ons) is somehow search-expensive and sometimes strange, so I'll try to customize it.
I'm accepting the answer as new questions should go to other thread.
Thank you!