Solved: Re: What are the best practices for managing Splun...

sloshburch · ‎01-10-2020

Operationally speaking, what are the best practices for keeping my DevOps users happy given Splunk's otherwise centralized data ingress model?

Specifically, if we control all data inputs centrally, this constrains the DevOps teams with paperwork and process that may require them to explore their own solutions (like their own Splunk deployment, or even a competitor solution). Conversely, if we give them free reign, we risk loading the Splunk system with undefined data, which impacts storage/license and the platform performance (extra processing needed to manage undefined sourcetypes).

Is there a happy balance between the two?

sloshburch · ‎01-10-2020

(Originally written and shared by @tlagatta_splunk):

For input management, there are generally two approaches (and most customers fall someplace in the middle)

The Splunk team owns and manages the inputs through a deployment server based on some workflow in a ticket system that asks the internal customer team to fill out an input questionnaire like this: https://www.aplura.com/assets/pdf/onboarding_cheatsheet.pdf
The internal customer team owns the input files on their systems, and leverages some sort of build tool or automation to create and configure the inputs at build time with what they know to be the current input details. Then they pass any necessary changes through a ticket system to the Splunk team to adjust Index-time details, such as new index names, date/time config, index routing details, and so on.

Many large customers are moving toward a blended DevOps approach:

Splunk team provides code samples in a code repo (skeleton app w/inputs and properties and so on)
Internal customer modifies as needed, submits changes to the Splunk team for review, then iterate as needed
Splunk team pushes to production and deploys this jointly developed app to the prod DS

For this to work smoothly, all the changes should happen inside the CI/CD change workflow. Following defined processes ensures that everyone knows what's in production at all times. The only way an app goes into production should be through the build pipeline, which ends in an automated deployment.

We've seen from many customers that onboard data blindly, which causes huge issues with performance (for example, poorly defined source types). That's fine if a customer has a reactive process for identifying and fixing undefined source types, but if the customer's approach is designed to ingest the data without investigation, that can cause problems. Many customers have built a model where upon ingest, the Splunk COE validates the data and assists with the creation of the first use cases, which ensures that the internal customer achieves value from the new data sources.

Some of my peers have had customers try the centralized location approach, but it causes problems for their internal DevOps customers. In some cases, DevOps teams have gone so far as to set up their own alternative to Splunk to avoid the centralized dependency and be "more independent". It's important to collaborate closely with DevOps teams and build a process that fits with their development ethos (for example, the blended DevOps approach with CI/CD pipeline above).

An underlying concern in this whole conversation is the motivation for deferring so much work from the main Splunk team. If you find the team is understaffed, then consider the following:

Is management OK with a single point of failure for a mission-critical data platform?
What happens when you go on vacation, or someone on either team leaves?

See the Staffing best practices for a Splunk deployment for additional operational guidance on staffing your Splunk team for proper business continuity.

View solution in original post

sloshburch · ‎01-10-2020

(Originally written and shared by @tlagatta_splunk):

For input management, there are generally two approaches (and most customers fall someplace in the middle)

The Splunk team owns and manages the inputs through a deployment server based on some workflow in a ticket system that asks the internal customer team to fill out an input questionnaire like this: https://www.aplura.com/assets/pdf/onboarding_cheatsheet.pdf
The internal customer team owns the input files on their systems, and leverages some sort of build tool or automation to create and configure the inputs at build time with what they know to be the current input details. Then they pass any necessary changes through a ticket system to the Splunk team to adjust Index-time details, such as new index names, date/time config, index routing details, and so on.

Many large customers are moving toward a blended DevOps approach:

Splunk team provides code samples in a code repo (skeleton app w/inputs and properties and so on)
Internal customer modifies as needed, submits changes to the Splunk team for review, then iterate as needed
Splunk team pushes to production and deploys this jointly developed app to the prod DS

For this to work smoothly, all the changes should happen inside the CI/CD change workflow. Following defined processes ensures that everyone knows what's in production at all times. The only way an app goes into production should be through the build pipeline, which ends in an automated deployment.

We've seen from many customers that onboard data blindly, which causes huge issues with performance (for example, poorly defined source types). That's fine if a customer has a reactive process for identifying and fixing undefined source types, but if the customer's approach is designed to ingest the data without investigation, that can cause problems. Many customers have built a model where upon ingest, the Splunk COE validates the data and assists with the creation of the first use cases, which ensures that the internal customer achieves value from the new data sources.

Some of my peers have had customers try the centralized location approach, but it causes problems for their internal DevOps customers. In some cases, DevOps teams have gone so far as to set up their own alternative to Splunk to avoid the centralized dependency and be "more independent". It's important to collaborate closely with DevOps teams and build a process that fits with their development ethos (for example, the blended DevOps approach with CI/CD pipeline above).

An underlying concern in this whole conversation is the motivation for deferring so much work from the main Splunk team. If you find the team is understaffed, then consider the following:

Is management OK with a single point of failure for a mission-critical data platform?
What happens when you go on vacation, or someone on either team leaves?

See the Staffing best practices for a Splunk deployment for additional operational guidance on staffing your Splunk team for proper business continuity.

mwirth_splunk · ‎01-10-2020

I will add that as with all DevOps / SRE efforts it's extremely important to thoughtfully set and enforce standards. In Splunk's case, for both ingestion patterns, sourcetype names, and log structures.

It is much easier to manage rapidly changing microservices and teams when you have a strict pattern like standardized log4j template -> HEC, or a shared library function for javascript for posting to HEC, each with sourcetype=stackName:microServiceName.

jpusey_splunk · ‎01-10-2020

Have you checked out the mothership app?
https://splunkbase.splunk.com/app/4646/

sloshburch · ‎01-10-2020

Aware of it. I see it as a solution for managing splunk deployments, not as a solution for simplifying a balance between CI/CD or DevOps teams and their consumption of splunk. Would you agree or is there an aspect of it I am overlooking?

What are the best practices for managing Splunk data inputs with a Dev/Ops user base?

Index This | Divide 100 by half. What do you get?

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

Splunk and Fraud