Getting Data In

Selective Indexing and Parsing in Splunk SIEM – Storing Full Logs but Parsing Only Compliance Fields

amimulahasun
Explorer

Hi Team,

I’m looking for guidance on designing a Splunk SIEM ingestion strategy for the following scenario:

We receive logs from multiple heterogeneous data sources (network devices, applications, servers, cloud services, etc.). Due to storage and licensing constraints, we do not want to fully index and parse all incoming data.

Our requirement is:

  • Only index and parse the fields required for compliance use cases (e.g., specific events and fields)

  • Store the remaining raw log data without parsing

  • Ensure the retained raw data is available for audit or forensic purposes if required later

I would like expert recommendations on the best architectural approach to achieve this.

Specifically:

  1. What is the recommended method to:

    • Filter events before indexing?

    • Route different data streams to separate indexes?

    • Store non-parsed logs efficiently?

  2. Should we use:

    • props.conf and transforms.conf for event filtering?

    • NullQueue routing for unwanted events?

    • Heavy Forwarders for preprocessing?

    • SmartStore for raw data retention?

  3. What is the best practice for:

    • Index-time vs search-time field extraction in this use case?

    • Minimizing indexed data volume while maintaining compliance integrity?

  4. Licensing concern:

    • If we store the full raw data in Splunk but do not parse or extract fields from it, will it still consume license?

    • Is license consumption based on ingestion volume regardless of parsing?

    • Are there supported ways to retain data without impacting license usage?

  5. Has anyone implemented a similar design in a production SIEM environment? What challenges should we expect?

Any architecture guidance, configuration examples, or real-world lessons learned would be greatly appreciated.

Thanks in advance!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Wait. Are you aware how Splunk works?

In a typical case Splunk indexes raw event but extractions are done in search-time. They are not - as with many other solutions on the market - done during ingestion process so they do not consume disk space. (With the exception of indexed fields and acceleration techniques like datamodel acceleration or report acceleration).

Also license usage is measured (assuming we're talking about ingest-based licensing) based on raw data written to indexes. So however many indexed fields you would have along your raw event, they wouldn't consume any additional license.

amimulahasun
Explorer

@PickleRick Thanks for your reply. Yes I know  how Splunk works. I get your point.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @amimulahasun ,

at first a question: your requirement is to reduce the license consuption or the storage?

If you want to redure the storeage requirements, the approach should be:

  • index all the raw data you need (if there are not useful events, filter them before indexing),
  • parse and save useful fields in Data Models (CIM or custom),
  • use these data for your searches,
  • store raw data in a SmartStore or as frozen data after a little time period (e.g. one month),
  • maintain data in Data Models for the time you need to execute searches,
  • maintain raw data as Smartstore or frozen data for the retention period  for audit or forensic purposes.

If instead you want also to reduce the license, you should:

  • analyze your data to understand if you need all the events (if there's some events to discard because not useful) and if you need the full raw data or a part of it to extract the useful fields,
  • remove the not useful events and not useful part of events in props.conf (SEDCMD command),
  • then follow the same previous approach with Data Models and Smartstore or Frozen data.

Ciao.

Giuseppe

amimulahasun
Explorer

Hi @gcusello ,

First of all Thanks for your reply on my post.
My requirement is I will Collect Logs form Different sources and stored it in local Storage but for now I have 50GB Per Day License for now for 16 types of log sources. And the count for this 16 types of devices are around 180. My Plan is to store all the raw logs for audit and compliance requirement meetup. But I will index my required security events for these devices. Can you please suggest me the best practice for this type of requirments.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @amimulahasun ,

a sI said, to reduce the license consuption, yo should:

  • analyze your data to understand if you need all the events (if there's some events to discard because not useful) and if you need the full raw data or a part of it to extract the useful fields,
  • remove the not useful events and not useful part of events in props.conf (SEDCMD command),
  • then follow the same previous approach with Data Models and Smartstore or Frozen data.

even id it is difficoult to pass from 18o GB/day to 50 GB/day!

Maybe you could try a mixed approach: reduce as much as possible the volume of events to index (using the below process), and enlarge your license.

Ciao.

Giuseppe

amimulahasun
Explorer

@gcusello Thanks for your Reply.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

How to find the worst searches in your Splunk environment and how to fix them

Everyone knows Splunk is a powerful platform for running searches and doing data analytics. Your ...

Share Your Feedback: On Admin Config Service (ACS)!

Help Us Build a Better Admin Config Service Experience (ACS)   We Want Your Feedback on Admin Config Service ...