Re: Selective Indexing and Parsing in Splunk SIEM ...

amimulahasun · ‎02-18-2026

Hi Team,

I’m looking for guidance on designing a Splunk SIEM ingestion strategy for the following scenario:

We receive logs from multiple heterogeneous data sources (network devices, applications, servers, cloud services, etc.). Due to storage and licensing constraints, we do not want to fully index and parse all incoming data.

Our requirement is:

Only index and parse the fields required for compliance use cases (e.g., specific events and fields)
Store the remaining raw log data without parsing
Ensure the retained raw data is available for audit or forensic purposes if required later

I would like expert recommendations on the best architectural approach to achieve this.

Specifically:

What is the recommended method to:
- Filter events before indexing?
- Route different data streams to separate indexes?
- Store non-parsed logs efficiently?
Should we use:
- props.conf and transforms.conf for event filtering?
- NullQueue routing for unwanted events?
- Heavy Forwarders for preprocessing?
- SmartStore for raw data retention?
What is the best practice for:
- Index-time vs search-time field extraction in this use case?
- Minimizing indexed data volume while maintaining compliance integrity?
Licensing concern:
- If we store the full raw data in Splunk but do not parse or extract fields from it, will it still consume license?
- Is license consumption based on ingestion volume regardless of parsing?
- Are there supported ways to retain data without impacting license usage?
Has anyone implemented a similar design in a production SIEM environment? What challenges should we expect?

Any architecture guidance, configuration examples, or real-world lessons learned would be greatly appreciated.

Thanks in advance!

PickleRick · ‎02-21-2026

Wait. Are you aware how Splunk works?

In a typical case Splunk indexes raw event but extractions are done in search-time. They are not - as with many other solutions on the market - done during ingestion process so they do not consume disk space. (With the exception of indexed fields and acceleration techniques like datamodel acceleration or report acceleration).

Also license usage is measured (assuming we're talking about ingest-based licensing) based on raw data written to indexes. So however many indexed fields you would have along your raw event, they wouldn't consume any additional license.

amimulahasun · ‎02-22-2026

@PickleRick Thanks for your reply. Yes I know how Splunk works. I get your point.

gcusello · ‎02-18-2026

Hi @amimulahasun ,

at first a question: your requirement is to reduce the license consuption or the storage?

If you want to redure the storeage requirements, the approach should be:

index all the raw data you need (if there are not useful events, filter them before indexing),
parse and save useful fields in Data Models (CIM or custom),
use these data for your searches,
store raw data in a SmartStore or as frozen data after a little time period (e.g. one month),
maintain data in Data Models for the time you need to execute searches,
maintain raw data as Smartstore or frozen data for the retention period for audit or forensic purposes.

If instead you want also to reduce the license, you should:

analyze your data to understand if you need all the events (if there's some events to discard because not useful) and if you need the full raw data or a part of it to extract the useful fields,
remove the not useful events and not useful part of events in props.conf (SEDCMD command),
then follow the same previous approach with Data Models and Smartstore or Frozen data.

Ciao.

Giuseppe

amimulahasun · ‎02-20-2026

Hi @gcusello ,

First of all Thanks for your reply on my post.
My requirement is I will Collect Logs form Different sources and stored it in local Storage but for now I have 50GB Per Day License for now for 16 types of log sources. And the count for this 16 types of devices are around 180. My Plan is to store all the raw logs for audit and compliance requirement meetup. But I will index my required security events for these devices. Can you please suggest me the best practice for this type of requirments.

gcusello · ‎02-22-2026

Hi @amimulahasun ,

a sI said, to reduce the license consuption, yo should:

analyze your data to understand if you need all the events (if there's some events to discard because not useful) and if you need the full raw data or a part of it to extract the useful fields,
remove the not useful events and not useful part of events in props.conf (SEDCMD command),
then follow the same previous approach with Data Models and Smartstore or Frozen data.

even id it is difficoult to pass from 18o GB/day to 50 GB/day!

Maybe you could try a mixed approach: reduce as much as possible the volume of events to index (using the below process), and enlarge your license.

Ciao.

Giuseppe

amimulahasun · ‎02-22-2026

@gcusello Thanks for your Reply.

Selective Indexing and Parsing in Splunk SIEM – Storing Full Logs but Parsing Only Compliance Fields

data

heavy forwarder

indexer

inputs.conf

props.conf

transforms.conf

universal forwarder

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Splunk Community Badges!

How to find the worst searches in your Splunk environment and how to fix them

Share Your Feedback: On Admin Config Service (ACS)!

Join the Conversation

Selective Indexing and Parsing in Splunk SIEM – Storing Full Logs but Parsing Only Compliance Fields

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.