Knowledge Management

Splunk app and best practices for indexing

mlstom
New Member

I am developing a Splunk app and just wanted to hear for someone what is considered to be the best practice when it comes sending events to Splunk, to be processed and indexed.

Basically, I am concerned that sending events into Splunk as soon as they are available would take a toll on the indexer because there will be constant flow of data every few seconds, but on the other hand, waiting for all data to come in and then index, is not an option because events could be coming in for days, and I can't wait so long to see the data in the system. So, my best guess is to set a cap on the number of events that would be indexed at a time, so for example, I would wait for 10000 events to accumulate and then send them into Splunk for processing. Could someone offer advice on this ?

0 Karma

sloshburch
Splunk Employee
Splunk Employee

You're overthinking it. Splunk indexers are specifically designed to handle the constant stream of data coming in. In fact, if for some reason Splunk slows down, the downstream forwarders will just queue up.

How are you sending the data to Splunk? I ask because in normal usage of Splunk you should never have to worry about this topic.

If I remember correctly, the indexing process will use about one core of the CPU so the other cores are available for returning search results of that data. If that load is insufficient, increase the indexing pipelines (more cores used) and increase the indexers (better data distribution).

Also, won't the users be misled if they try to run reports on the data and don't realize that it's incomplete or not currently sending?

I'm not sure if you can tell, but I'm very concerned by the question. I am confident that any means of modulating the data flow will provide a terrible experience with the Splunk platform.

Respond back with more info and I'm happy to answer other concerns about this.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I don't know about best practices in this area, but IMO, data should be indexed as soon as it's available. Splunk can't act on data it doesn't have.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...

Avoid Certificate Expiry Issues in Splunk Enterprise with Certificate Assist

This blog post is part 2 of 4 of a series on Splunk Assist. Click the links below to see the other ...

Using Machine Learning for Hunting Security Threats

REGISTER NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more ...