Hello Team,
I need to backup my Splunk logs data on AWS S3 Bucket. But i need to confirm in which format logs will be stored, so incase i need that logs in future i will convert in readable form.
Please note, i really need to know the exact logs format of data stored in Splunk. Please confirm.
Regards,
Asad Nafees
When backing up Splunk logs to AWS S3, the format depends on which method you use for the backup. Here are the available methods and their corresponding formats:
## Ingest Actions
If you're using Ingest Actions to send data to S3, you have three format options:
1. Raw format
a. This preserves the original format of your data exactly as it was ingested
b. Best for maintaining complete fidelity with source data
c. May require additional parsing when retrieving
2. Newline-delimited JSON (ndjson)
a. Each event is a separate JSON object on a new line
b. Includes both the raw event and extracted fields
c. Best option if you plan to use Federated Search with S3 later
d. Easily parseable by many analytics tools
3. JSON format
a. Standard JSON format with events in an array structure
b. Includes metadata and extracted fields
c. Good for interoperability with other systems
## Edge Processor and Ingest Processor
If using Edge Processor or Ingest Processor:
1. JSON format
a. Default format
b. Structured in HTTP Event Collector (HEC) compatible format
c. Includes event data and metadata
2. Parquet format (Ingest Processor only)
a. Columnar storage format
b. Offers better compression and query performance
c. Excellent for analytical workloads
d. Supported by many big data tools
## Frozen Data Archive
If archiving frozen buckets to S3:
1. Splunk proprietary format
a. Data stored in Splunk's internal format (tsidx and raw files)
b. Requires Splunk to read and interpret
c. Best for data you might want to thaw back into Splunk later
2. Custom format (with cold2frozen scripts)
a. You can customize how data is exported using scripts
b. Can transform to various formats including CSV, JSON, etc.
## Recommendations
Based on your needs for future retrieval and readability:
1. If you need the data to be easily readable by other systems:
a. Use Ingest Actions with ndjson format
b. Or Ingest Processor with Parquet format for analytical workloads
2. If you might want to re-ingest into Splunk:
a. ndjson format is easiest for re-ingestion
b. Frozen bucket archives can be thawed but only within Splunk
3. If storage efficiency is a priority:
a. Parquet format (via Ingest Processor) offers the best compression
b. ndjson is a good balance between readability and size
For comprehensive documentation, refer to:
Ingest Actions: https://docs.splunk.com/Documentation/SplunkCloud/latest/IngestActions/S3Destination
Edge Processor: https://docs.splunk.com/Documentation/SplunkCloud/latest/EdgeProcessor/AmazonS3Destination
Ingest Processor: https://docs.splunk.com/Documentation/SplunkCloud/latest/IngestProcessor/AmazonS3Destin
Data Self-Storage: https://docs.splunk.com/Documentation/SplunkCloud/latest/Admin/DataSelfStorage
Please give 👍 for support 😁 happly splunking .... 😎
Hey @asimit
Out of interest, what LLM are you using to generates these responses?
By the way, half of the links you posted are hallucinations.
I assume you are using Ingest Actions to send your data to S3? If that is the case then you have the option of either raw, newline-delimited JSON or JSON format. This makes it easier for other tools or re-ingestion in the future as it is not stored in a proprietary format.
If you are ever looking to use Federated Search for S3 to search your S3 data in the future then this requires newline-delimited JSON (ndjson).
For more information I'd recommend checking out the Ingest Actions Architecture docs.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Other options than IA are Edge Processor, Ingest Processor and/or frozen buckets.
With IA, EP and IP output format is JSON, actually HEC suitable format. With IP you can also select Parquet if you want.
If you are running your Enterprise in AWS, then you could configure that your frozen buckets will stored on S3 buckets. Based on your cold2frozen script you could store only raw data into S3 or more if you really want it.