Getting Data In

Properly index json files with events as elements of an array (at the example of Cloudtrail)

daubsi_2
Explorer

I would like to manually import AWS Cloudtrail logs which were stored as GZipped JSON Files on S3. Those files reside on my local disk, one file per hour, per day, etc. They should not be imported directly from the Cloud, e.g. via the Splunk TA for AWS.

I have installed that app though to get all the various AWS specific sourcetypes.

The problem with the data is, that those files contain only a single line of data which is a huge JSON array containing all the individual events. This is apparently not too seldom, so I am refering to AWS Cloudtrail only for the sake of providing an example of this format.

Small artificially contrived example how such a file looks like, here 3 events.

 

 

{"Records":[{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:01Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:03Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:05Z","userIdentity":{"type":"AssumedRole"}}]}

 

 

Of course the real Cloudtrail events are much more talkative and layered, but all this does not pose a problem for that case here.

Selecting the Sourcetype " aws:cloudtrail"  does not properly split the events. I changed the LINE_BREAKER to the following value:

LINE_BREAKER=((\{"Records":\[)*|,*){"eventVersion"

Using this I was able to properly index all the events and even get rid of the header upfront. However, the very last event still gets wrongly indexed, as it ends with the closing " ]}" from the opening/wrapping "Records"  element and as such it's not proper JSON.

daubsi_2_0-1688584860162.png

How can I get rid of that trailing "junk" "]}" so also the last event gets properly indexed? 

Labels (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Try using SEDCMD to remove the junk characters.  Put this in props.conf.

[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//
---
If this reply helps you, Karma would be appreciated.

View solution in original post

daubsi_2
Explorer

Works perfectly! Awesome! Thanks a lot!

richgalloway
SplunkTrust
SplunkTrust

Try using SEDCMD to remove the junk characters.  Put this in props.conf.

[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//
---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...