Hi Team, I’m looking for guidance on designing a Splunk SIEM ingestion strategy for the following scenario: We receive logs from multiple heterogeneous data sources (network devices, applications, servers, cloud services, etc.). Due to storage and licensing constraints, we do not want to fully index and parse all incoming data. Our requirement is: Only index and parse the fields required for compliance use cases (e.g., specific events and fields) Store the remaining raw log data without parsing Ensure the retained raw data is available for audit or forensic purposes if required later I would like expert recommendations on the best architectural approach to achieve this. Specifically: What is the recommended method to: Filter events before indexing? Route different data streams to separate indexes? Store non-parsed logs efficiently? Should we use: props.conf and transforms.conf for event filtering? NullQueue routing for unwanted events? Heavy Forwarders for preprocessing? SmartStore for raw data retention? What is the best practice for: Index-time vs search-time field extraction in this use case? Minimizing indexed data volume while maintaining compliance integrity? Licensing concern: If we store the full raw data in Splunk but do not parse or extract fields from it, will it still consume license? Is license consumption based on ingestion volume regardless of parsing? Are there supported ways to retain data without impacting license usage? Has anyone implemented a similar design in a production SIEM environment? What challenges should we expect? Any architecture guidance, configuration examples, or real-world lessons learned would be greatly appreciated. Thanks in advance!
... View more