Getting Data In

Properly index json files with events as elements of an array (at the example of Cloudtrail)

daubsi_2
Explorer

I would like to manually import AWS Cloudtrail logs which were stored as GZipped JSON Files on S3. Those files reside on my local disk, one file per hour, per day, etc. They should not be imported directly from the Cloud, e.g. via the Splunk TA for AWS.

I have installed that app though to get all the various AWS specific sourcetypes.

The problem with the data is, that those files contain only a single line of data which is a huge JSON array containing all the individual events. This is apparently not too seldom, so I am refering to AWS Cloudtrail only for the sake of providing an example of this format.

Small artificially contrived example how such a file looks like, here 3 events.

 

 

{"Records":[{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:01Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:03Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:05Z","userIdentity":{"type":"AssumedRole"}}]}

 

 

Of course the real Cloudtrail events are much more talkative and layered, but all this does not pose a problem for that case here.

Selecting the Sourcetype " aws:cloudtrail"  does not properly split the events. I changed the LINE_BREAKER to the following value:

LINE_BREAKER=((\{"Records":\[)*|,*){"eventVersion"

Using this I was able to properly index all the events and even get rid of the header upfront. However, the very last event still gets wrongly indexed, as it ends with the closing " ]}" from the opening/wrapping "Records"  element and as such it's not proper JSON.

daubsi_2_0-1688584860162.png

How can I get rid of that trailing "junk" "]}" so also the last event gets properly indexed? 

Labels (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Try using SEDCMD to remove the junk characters.  Put this in props.conf.

[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//
---
If this reply helps you, Karma would be appreciated.

View solution in original post

daubsi_2
Explorer

Works perfectly! Awesome! Thanks a lot!

richgalloway
SplunkTrust
SplunkTrust

Try using SEDCMD to remove the junk characters.  Put this in props.conf.

[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//
---
If this reply helps you, Karma would be appreciated.
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...