Splunk Cloud Platform

Confirm Splunk Logs Format

asadnafees138
Loves-to-Learn

Hello Team,

I need to backup my Splunk logs data on AWS S3 Bucket. But i need to confirm in which format logs will be stored, so incase i need that logs in future i will convert in readable form. 

Please note, i really need to know the exact logs format of data stored in Splunk. Please confirm.

Regards,
Asad Nafees

Labels (1)
0 Karma

asimit
Path Finder

Hi @asadnafees138 

When backing up Splunk logs to AWS S3, the format depends on which method you use for the backup. Here are the available methods and their corresponding formats:

## Ingest Actions

If you're using Ingest Actions to send data to S3, you have three format options:

1. Raw format
a. This preserves the original format of your data exactly as it was ingested
b. Best for maintaining complete fidelity with source data
c. May require additional parsing when retrieving

2. Newline-delimited JSON (ndjson)
a. Each event is a separate JSON object on a new line
b. Includes both the raw event and extracted fields
c. Best option if you plan to use Federated Search with S3 later
d. Easily parseable by many analytics tools

3. JSON format
a. Standard JSON format with events in an array structure
b. Includes metadata and extracted fields
c. Good for interoperability with other systems

## Edge Processor and Ingest Processor

If using Edge Processor or Ingest Processor:

1. JSON format
a. Default format
b. Structured in HTTP Event Collector (HEC) compatible format
c. Includes event data and metadata

2. Parquet format (Ingest Processor only)
a. Columnar storage format
b. Offers better compression and query performance
c. Excellent for analytical workloads
d. Supported by many big data tools

## Frozen Data Archive

If archiving frozen buckets to S3:

1. Splunk proprietary format
a. Data stored in Splunk's internal format (tsidx and raw files)
b. Requires Splunk to read and interpret
c. Best for data you might want to thaw back into Splunk later

2. Custom format (with cold2frozen scripts)
a. You can customize how data is exported using scripts
b. Can transform to various formats including CSV, JSON, etc.

## Recommendations

Based on your needs for future retrieval and readability:

1. If you need the data to be easily readable by other systems:
a. Use Ingest Actions with ndjson format
b. Or Ingest Processor with Parquet format for analytical workloads

2. If you might want to re-ingest into Splunk:
a. ndjson format is easiest for re-ingestion
b. Frozen bucket archives can be thawed but only within Splunk

3. If storage efficiency is a priority:
a. Parquet format (via Ingest Processor) offers the best compression
b. ndjson is a good balance between readability and size

For comprehensive documentation, refer to:
Ingest Actions: https://docs.splunk.com/Documentation/SplunkCloud/latest/IngestActions/S3Destination
Edge Processor: https://docs.splunk.com/Documentation/SplunkCloud/latest/EdgeProcessor/AmazonS3Destination
Ingest Processor: https://docs.splunk.com/Documentation/SplunkCloud/latest/IngestProcessor/AmazonS3Destin
Data Self-Storage: https://docs.splunk.com/Documentation/SplunkCloud/latest/Admin/DataSelfStorage

Please give 👍 for support 😁 happly splunking .... 😎

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hey @asimit 

Out of interest, what LLM are you using to generates these responses? 

By the way, half of the links you posted are hallucinations.

 

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @asadnafees138 

I assume you are using Ingest Actions to send your data to S3? If that is the case then you have the option of either raw, newline-delimited JSON or JSON format. This makes it easier for other tools or re-ingestion in the future as it is not stored in a proprietary format.

If you are ever looking to use Federated Search for S3 to search your S3 data in the future then this requires newline-delimited JSON (ndjson).

For more information I'd recommend checking out the Ingest Actions Architecture docs.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Other options than IA are Edge Processor, Ingest Processor and/or frozen buckets.

With IA, EP and IP output format is JSON, actually HEC suitable format. With IP you can also select Parquet if you want.

If you are running your Enterprise in AWS, then you could configure that your frozen buckets will stored on S3 buckets. Based on your cold2frozen script you could store only raw data into S3 or more if you really want it.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...