Hi Team,
I’m looking for guidance on designing a Splunk SIEM ingestion strategy for the following scenario:
We receive logs from multiple heterogeneous data sources (network devices, applications, servers, cloud services, etc.). Due to storage and licensing constraints, we do not want to fully index and parse all incoming data.
Our requirement is:
Only index and parse the fields required for compliance use cases (e.g., specific events and fields)
Store the remaining raw log data without parsing
Ensure the retained raw data is available for audit or forensic purposes if required later
I would like expert recommendations on the best architectural approach to achieve this.
Specifically:
What is the recommended method to:
Filter events before indexing?
Route different data streams to separate indexes?
Store non-parsed logs efficiently?
Should we use:
props.conf and transforms.conf for event filtering?
NullQueue routing for unwanted events?
Heavy Forwarders for preprocessing?
SmartStore for raw data retention?
What is the best practice for:
Index-time vs search-time field extraction in this use case?
Minimizing indexed data volume while maintaining compliance integrity?
Licensing concern:
If we store the full raw data in Splunk but do not parse or extract fields from it, will it still consume license?
Is license consumption based on ingestion volume regardless of parsing?
Are there supported ways to retain data without impacting license usage?
Has anyone implemented a similar design in a production SIEM environment? What challenges should we expect?
Any architecture guidance, configuration examples, or real-world lessons learned would be greatly appreciated.
Thanks in advance!
Wait. Are you aware how Splunk works?
In a typical case Splunk indexes raw event but extractions are done in search-time. They are not - as with many other solutions on the market - done during ingestion process so they do not consume disk space. (With the exception of indexed fields and acceleration techniques like datamodel acceleration or report acceleration).
Also license usage is measured (assuming we're talking about ingest-based licensing) based on raw data written to indexes. So however many indexed fields you would have along your raw event, they wouldn't consume any additional license.
@PickleRick Thanks for your reply. Yes I know how Splunk works. I get your point.
Hi @amimulahasun ,
at first a question: your requirement is to reduce the license consuption or the storage?
If you want to redure the storeage requirements, the approach should be:
If instead you want also to reduce the license, you should:
Ciao.
Giuseppe
Hi @gcusello ,
First of all Thanks for your reply on my post.
My requirement is I will Collect Logs form Different sources and stored it in local Storage but for now I have 50GB Per Day License for now for 16 types of log sources. And the count for this 16 types of devices are around 180. My Plan is to store all the raw logs for audit and compliance requirement meetup. But I will index my required security events for these devices. Can you please suggest me the best practice for this type of requirments.
Hi @amimulahasun ,
a sI said, to reduce the license consuption, yo should:
even id it is difficoult to pass from 18o GB/day to 50 GB/day!
Maybe you could try a mixed approach: reduce as much as possible the volume of events to index (using the below process), and enlarge your license.
Ciao.
Giuseppe
@gcusello Thanks for your Reply.