Solved: How to determine ingestion sizing with a single in...

JMondares · ‎02-15-2022

Hello,

I'm currently undergoing a sizing exercise to determine how large of a Splunk license I need, and was wondering if anyone could help.

A quick background: I've got a trial license of Splunk Enterprise running on-prem as a single instance deployment with the InfoSec app, and I am preparing to deploy Universal Forwarders to a select group of systems that will send security-related events and logs that I'd like to have Splunk ingest and index. My organization is currently not interested in having Splunk ingest operations-type data, and want to keep the scope of what Splunk ingests and indexes limited to just security-related events.

I do have a specific list of sources, events and event IDs I want to include in the inputs.conf file, but the question I have is that, will my single instance filter out all events that are not in the inputs.conf whitelist, and then report to me how much data (in GB) was ultimately ingested based on the inputs.conf whitelist? Or would I need to spin up another server that runs Splunk as a Heavy Forwarder, have the UFs point to that, and reconfigure the original Splunk instance to become a indexing / search head server?

It's important for me to get accurate data on how much Splunk ingests so that I can work with their sales team to get the most accurate pricing for how big of a Splunk license my organization actually needs. I'm familiar with Splunk's workload licensing model, but the initial costs I've been tasked with obtaining are for the ingestion model.

Please let me know if you need any additional information. Thanks in advance for any help you can provide!

Jason

PickleRick · ‎02-15-2022

If you're considering amounts of data that can be processed by a single server, you most probably don't want workload pricing. It makes sense with big volumes.

Delegating event processing to Heavy Forwarder(s) offloads some system load but apart from where it happens there's really no significant difference between processing ingested data on HF and on indexer. The processing (most of it; UF does some small part) is simply performed on the first "heavy" component that is on event's path from source to index. So unless you overstress your indexer, you can do without the HF.

And remember that Splunk license usage is measured based on size of raw event data written to indexes (not counting splunk's internal indexes and stash sourcetype used for summary indexing). It doesn't include indexed fields, replicated buckets and other things. Just data that's getting written as _raw event after all filtering, transformations and such.

In case of metric indexes each event is counted as 160 bytes.

View solution in original post

JMondares · ‎02-18-2022

I deployed the UF to one test Windows server so I can test the waters on what to expect when I eventually deploy them on a mass scale, as well as verifying how large each event is that ultimately gets indexed.

I learned a great deal so far on sizing through having this test instance of Splunk running. I saw that when the UF is first installed, the UF is going to send everything it has in the Event Viewer logs and any other log files/directories that I specified during setup to Splunk. I'll just say this is a great way to test whether or not my inputs.conf file(s) are working correctly.

So unless either the inputs.conf files are working properly and/or I clear out all the logs on the server prior to installing the UF, I should be prepared for a bit of a tidal wave of events that get ingested at the very beginning from each server I have a UF installed on, but that's another story for another thread. 😅

Your estimate was right on the mark, though; the events being indexed coming in from that server's UF were averaging between 120-157 bytes each across several thousand events, so it looks like using 160 bytes as the general size of an event will be really beneficial in helping me more accurately estimate just how large of a license we actually need. I'll clear out the Event Viewer logs on my test server and reinstall the UF immediately afterwards so I can start to get a better idea of just how much log data a typical server generates on a typical day, and use that as a ballpark until more servers get the UF installed onto them (e.g. domain controllers).

And by that math, it could very well be that the initial estimate 5GB of ingestion a day may very well turn out to be supreme overkill for my organization's needs (and rather costly for us), especially if I only have a limited scope of events I want Splunk to ingest.

Thanks for your help! 😀

richgalloway · ‎02-15-2022

IMO, this is the best way to estimate ingestion volume. I've seen attempts to do so using spreadsheets and approximations of event volumes and sizes, but actually doing it and taking measurements is the superior method.

No, you don't need to stand up a heavy forwarders. Your standalone instance is a fully-capable Splunk server that will honor the filters you configure.

---
If this reply helps you, Karma would be appreciated.

JMondares · ‎02-18-2022

Thanks! The 60-day trial really helps buy me some valuable time to collect some sample data to figure out just how big of a license I really need.

Yeah, I quickly discovered that trying to estimate how much the ingestion rate will be should not be judged based on how large the Event Viewer log files are, or extracting one random event from the Event Viewer and saving it as either an individual exported event, or even copying and pasting the event into Notepad and saving it as a text file. Splunk seems to do a really good job of stripping excess data and only indexing the necessary stuff.

Thanks for your help, I appreciate it! 🙂

PickleRick · ‎02-15-2022

If you're considering amounts of data that can be processed by a single server, you most probably don't want workload pricing. It makes sense with big volumes.

Delegating event processing to Heavy Forwarder(s) offloads some system load but apart from where it happens there's really no significant difference between processing ingested data on HF and on indexer. The processing (most of it; UF does some small part) is simply performed on the first "heavy" component that is on event's path from source to index. So unless you overstress your indexer, you can do without the HF.

And remember that Splunk license usage is measured based on size of raw event data written to indexes (not counting splunk's internal indexes and stash sourcetype used for summary indexing). It doesn't include indexed fields, replicated buckets and other things. Just data that's getting written as _raw event after all filtering, transformations and such.

In case of metric indexes each event is counted as 160 bytes.

How to determine ingestion sizing with a single instance deployment

heavy forwarder

inputs.conf

universal forwarder

whitelist

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?