Questions on the nature of data collection in Splu...

Cehunter · ‎05-30-2023

I am rather new to Splunk so far having come from previously using Event Sentry for a small offline network of VM based systems on a VDI. Simply put, our move to Splunk was in order to incorporate the logging of Linux systems soon to come as well.

So far I have opted for my company to get a single 1 GB/day license since the current configuration in Event Sentry that I use to capture event logs from the Windows systems generates about a half a Gig a day. So I figured Splunk would be pretty similar in its data collection if I am opting to collect the same things. Come to actually stand the server up and try to add my first few servers in data sets and come to find that these few servers with only the 3 Event logs I care about (System, Security, Application) in addition to the Splunk server itself have basically completely tapped out my 1 GB/day limit.

Am I missing some crucial configuration component here or did I insanely underestimate the collection that would happen here? Realistically I should have tried this out probably prior to going for the licensed route but I thought the collection would be akin to what I have seen before.

Any details or assistance in finding resources about this stuff would be great. As it stands I have been searching for the details on what all is captured but am coming up with not much.

PickleRick · ‎05-30-2023

Typically before you deploy a Splunk installation, you size it (both in terms of needed hardware/storage as well as license). For that you usually set up a test rig with a trial license, ingest data from a representative subset of your sources and scale it up according to the size of your environment.

Splunk license consumption is based on raw data size of events written to indexes so the usage depends on:

1) Sources and their verbosity

2) Whether you filter events or not (it's quite common to do event filtering because - depending on intended use - you will most probably not need all EventIDs; even the windows add-on does some filtering internally).

3) What format do you ingest your events in (in case of windows events there are two different formats - "text" and XML). Ingesting XML events, as far as I remember, produces smaller events and thus consumes less license.

Cehunter · ‎05-30-2023

So realistically the only thing my security personnel are concerned with are event logs of the servers and systems associated with the network. I will have to tailor this a bit and play around with whatever innate data it is selecting to pick up as I feel like I should not be seeing that much of a discrepancy in data sizes from what I am used to collecting unless this is just way more detailed in what it captures. I also noticed the option to have a universal forwarder send forth data would be the better option to push things out to the Splunk server. So I will have to stage that first to make sure that works as it should.

Would you happen to know how long Splunk keeps data by default? I didn't see any age settings for data but assume you can keep it forever by default.

gcusello · ‎05-30-2023

Hi @Cehunter,

you can define index retention adding a parameter in indexes.conf: "frozenTimePeriodInSecs".

By default the retention time is 6 years on Splunk Enterprise and 3 months on Splunk Cloud.

You can have information about data retention using the Monitoring Console App that's installed by default.

Ciao.

Giuseppe

gcusello · ‎05-30-2023

Hi @Cehunter,

to analyze your data collection, you should analyze the logs you're receiving, maybe some of them aren't useful for you, e.g. /var/log/secure logs should be useful, e.g. history probably not.

The choose depends on the Use Cases that you need to implement.

I don't know Event Sentry so I don't know how it calculates the traffic, Splunk takes all the logs and caculated the indexed volume of logs.

Ciao.

Giuseppe

Questions on the nature of data collection in Splunk

data

Linux

Windows

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)