Operationally speaking, what are the best practices for keeping my DevOps users happy given Splunk's otherwise centralized data ingress model?
Specifically, if we control all data inputs centrally, this constrains the DevOps teams with paperwork and process that may require them to explore their own solutions (like their own Splunk deployment, or even a competitor solution). Conversely, if we give them free reign, we risk loading the Splunk system with undefined data, which impacts storage/license and the platform performance (extra processing needed to manage undefined sourcetypes).
Is there a happy balance between the two?
(Originally written and shared by @tlagatta_splunk):
For input management, there are generally two approaches (and most customers fall someplace in the middle)
Many large customers are moving toward a blended DevOps approach:
For this to work smoothly, all the changes should happen inside the CI/CD change workflow. Following defined processes ensures that everyone knows what's in production at all times. The only way an app goes into production should be through the build pipeline, which ends in an automated deployment.
We've seen from many customers that onboard data blindly, which causes huge issues with performance (for example, poorly defined source types). That's fine if a customer has a reactive process for identifying and fixing undefined source types, but if the customer's approach is designed to ingest the data without investigation, that can cause problems. Many customers have built a model where upon ingest, the Splunk COE validates the data and assists with the creation of the first use cases, which ensures that the internal customer achieves value from the new data sources.
Some of my peers have had customers try the centralized location approach, but it causes problems for their internal DevOps customers. In some cases, DevOps teams have gone so far as to set up their own alternative to Splunk to avoid the centralized dependency and be "more independent". It's important to collaborate closely with DevOps teams and build a process that fits with their development ethos (for example, the blended DevOps approach with CI/CD pipeline above).
An underlying concern in this whole conversation is the motivation for deferring so much work from the main Splunk team. If you find the team is understaffed, then consider the following:
See the Staffing best practices for a Splunk deployment for additional operational guidance on staffing your Splunk team for proper business continuity.
(Originally written and shared by @tlagatta_splunk):
For input management, there are generally two approaches (and most customers fall someplace in the middle)
Many large customers are moving toward a blended DevOps approach:
For this to work smoothly, all the changes should happen inside the CI/CD change workflow. Following defined processes ensures that everyone knows what's in production at all times. The only way an app goes into production should be through the build pipeline, which ends in an automated deployment.
We've seen from many customers that onboard data blindly, which causes huge issues with performance (for example, poorly defined source types). That's fine if a customer has a reactive process for identifying and fixing undefined source types, but if the customer's approach is designed to ingest the data without investigation, that can cause problems. Many customers have built a model where upon ingest, the Splunk COE validates the data and assists with the creation of the first use cases, which ensures that the internal customer achieves value from the new data sources.
Some of my peers have had customers try the centralized location approach, but it causes problems for their internal DevOps customers. In some cases, DevOps teams have gone so far as to set up their own alternative to Splunk to avoid the centralized dependency and be "more independent". It's important to collaborate closely with DevOps teams and build a process that fits with their development ethos (for example, the blended DevOps approach with CI/CD pipeline above).
An underlying concern in this whole conversation is the motivation for deferring so much work from the main Splunk team. If you find the team is understaffed, then consider the following:
See the Staffing best practices for a Splunk deployment for additional operational guidance on staffing your Splunk team for proper business continuity.
I will add that as with all DevOps / SRE efforts it's extremely important to thoughtfully set and enforce standards. In Splunk's case, for both ingestion patterns, sourcetype names, and log structures.
It is much easier to manage rapidly changing microservices and teams when you have a strict pattern like standardized log4j template -> HEC, or a shared library function for javascript for posting to HEC, each with sourcetype=stackName:microServiceName.
Have you checked out the mothership app?
https://splunkbase.splunk.com/app/4646/
Aware of it. I see it as a solution for managing splunk deployments, not as a solution for simplifying a balance between CI/CD or DevOps teams and their consumption of splunk. Would you agree or is there an aspect of it I am overlooking?