Getting Data In

Anyone have experience Using Splunk with NiFi, StreamSets or Cribl?

TheFrunkster
Explorer

Is anyone using NiFi, StreamSets or Cribl as part of your log delivery pipeline?  My team is trying to build a more robust pipeline.  Before data is sent to Splunk we would love to clean-up and fix any data issues before data gets indexed.  Looking for experiences, pros and cons for each tool.  Any experience that could be shared would be really appreciated.

Regards,
The Frunkster

Labels (1)
Tags (3)
0 Karma

cbutler_isenpai
Engager

Hey Frunkster,

Did you ever get help with this? 

My customer just procured 10TB of Cribl.  

They want to accomplish a few things:

  1. Decouple the forwarding tier relationship from Splunk to open up the possibilities for replacing Splunk in the future. (shame on me for saying that..I know).
  2. Like you said, great way to clean up the data:   The  search TERM command in Splunk is exceedingly powerful, but if your data doesn't have clean segmentation breaks that contains the KVP you are looking for, then you can't utilize this very awesome feature.  This is a great use case for Cribl to add KVP structure to the data to allow for this feature so you can speed up your searches by 500,000 times:  https://conf.splunk.com/files/2019/slides/FN1407.pdf
  3. There are two ways to look at this next use case/requirement:  (again, won't be popular with your sales rep)
    1. Drastically reduce ingestion rate (licensing - reduce costs at renewal) by removing unnecessary "noise" in a much cleaner easier way than using HF/HEC /w parsing queue and nullQueue. - should be pretty easy to achieve upwards of 50% reduction.
    2. Drastically increase the ingestion rate (up to 50%) - because you freed up head room in your license to accomplish more ingestion. (stretch out the licensing you have longer - avoiding increased costs)
  4. If you have an audit mission, then you can also leverage the Cribl replay feature where you can copy original data to S3 object storage, and replay it anytime you need it.  Great for going back 5 years to find what bad guy did, so you can prosecute him/her/they and send them to jail.  In order to successfully do that, you need an unaltered version of the log that is defensible in a court room.  - chain of custody kinda stuff.
  5. Most of us, if not all of us, should be implementing a true DevOps process when implementing dashboards.  This includes a full Dev/Test/Prod development process (with Splunk instances on each D/T/P) to avoid people impacting the end-user experience in production .  All it takes is some poorly trained individuals with a little too much ambition to bring down the house.  Cribl easily allows you to take copies/samples of your production data, and play it into your dev/test environments to avoid harm to your production environment.  
  6. Gitops...very important part of your DevOps strategy.  Cribl natively supports gitops directly in the tool. You will need to install 'git' on your leader node first, and it just kinda works right out of the box.
  7. Observability- there is  pretty powerful observability in your data ingestion process.  Without it, we routinely are told after the fact (sometimes well after the fact) that a data source is missing in Splunk.  With a 10TB splunk system, it is difficult to keep track of 100's if not 1000's of data sources from 100,000+ endpoints. Cribl makes this much easier.
  8. Drop in replacement.  This particular feature is what sealed the deal if you had to ask me.  You can directly replace any existing VM/cloud/bare metal dedicated HF/HEC/syslog servers with Cribl.  You can run them in parallel on the same box, and cut over the data sources one by one, and then remove all Splunk forwarding components completely.  
  9. Cribl Edge (free) and Appscope (free) - well, I will let you read about that, but Cribl edge can replace the Splunk UF as well (Linux only today), but Windows is coming.  If you have to collect Windows data, and you want to remove agents all together, then you can use WEC/WEF and have Cribl Edge forward that data directly into Splunk via the Cribl Edge/Stream suite.   Appscope grabs some pretty impressive session data from applications to include trace/log/event data to create telemetry to monitor critical mission applications

Good luck.  Give me a shout if you need help.

Tags (1)
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...