Anything in particular we should watch out for while upgrading the Splunk App for Enterprise Security in a search head cluster?
We'll be upgrading ES from 3.2.1 to 3.3 in a few weeks. We've read the documentation and know the steps, but are hoping to lean on the experience of others as well if possible. Has anyone upgraded ES in a SHC? Were there any gotchas we should be aware of? Any issues coming out of the upgrade? Any other insight into the process?
Thanks!
why not upgrading to ES 4.1.x?
this post is over a year old. I think 4.0 was released last year after .conf, so it wasn't available at the time of this post 😉
What was the solution to this fix?
We still have a ticket open with Splunk and are hoping they get back to us soon. A few issues we're dealing with in our four-server ES SHC...
Incident Review
On one server, we cannot assign incidents and get a generic notable error
On 2 other servers, if you assign an incident to someone, Splunk assigns all incidents to that person
On one server, the assignments work correctly
Search Errors
We are getting a lot of errors similar to the following. That lookup does exist, but not sure how to interpret these messages. These errors show for all searches, in ES or outside of ES.
[x12prd30] Error 'Could not find all of the specified destination fields in the lookup table.' for conf '(?::){0}bro_*' and lookup table 'identity_lookup_expanded'.
[x12prd30] Error 'Could not find all of the specified destination fields in the lookup table.' for conf '(?i)source::....zip(.\d+)?' and lookup table 'identity_lookup_expanded'.
[x12prd30] Error 'Could not find all of the specified destination fields in the lookup table.' for conf 'ActiveDirectory' and lookup table 'identity_lookup_expanded'.
Configuration Checks
Consistent messages related to configuration_check.py for a few specific conf checks. All of the messages are based on the script exiting with Error Code 3 - a generic failure message.
Correlation Searches
Noticing that the drilldown offsets for correlation searches don't seem to retain the values we specify, but not sure if related to overall cluster issues.
We still have a few issues but our main mistake was caused while copying the config from the stage server we upgraded back to the deployer. It appears as though scp thew a lot of permission denied errors which we missed. So essentially, we deployed a mix of 3.2.1 and 3.3.0 files down to our members, which they obviously didn't handle well.
We went back through the process again and upgraded to 3.3.1. We do still have lookup errors on search, but will continue to work with Splunk Support to address.
Do note that you should be using the preserve lookups true flag when deploy. That can cause lookup related issues.
Yes, we did include that flag when applying the bundle - both during the original upgrade and when we went to 3.3.1. We don't use it in general though for the ES cluster and had never seen it before we read the upgrade docs.
Tomorrow is the upgrade. I'll share my experiences here.
We have already staged the upgrade on our test stand-alone instance and performed the remediation steps identified by the upgrade process. The cluster master has also been staged with the appropriate app/index changes.
Yep, all sorts of problems. They sure don't want to make this an easy app to manage. I'll try to get a list of things we're running into and share here once resolved.