Getting Data In

Properly index json files with events as elements of an array (at the example of Cloudtrail)

daubsi_2
Explorer

I would like to manually import AWS Cloudtrail logs which were stored as GZipped JSON Files on S3. Those files reside on my local disk, one file per hour, per day, etc. They should not be imported directly from the Cloud, e.g. via the Splunk TA for AWS.

I have installed that app though to get all the various AWS specific sourcetypes.

The problem with the data is, that those files contain only a single line of data which is a huge JSON array containing all the individual events. This is apparently not too seldom, so I am refering to AWS Cloudtrail only for the sake of providing an example of this format.

Small artificially contrived example how such a file looks like, here 3 events.

 

 

{"Records":[{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:01Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:03Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:05Z","userIdentity":{"type":"AssumedRole"}}]}

 

 

Of course the real Cloudtrail events are much more talkative and layered, but all this does not pose a problem for that case here.

Selecting the Sourcetype " aws:cloudtrail"  does not properly split the events. I changed the LINE_BREAKER to the following value:

LINE_BREAKER=((\{"Records":\[)*|,*){"eventVersion"

Using this I was able to properly index all the events and even get rid of the header upfront. However, the very last event still gets wrongly indexed, as it ends with the closing " ]}" from the opening/wrapping "Records"  element and as such it's not proper JSON.

daubsi_2_0-1688584860162.png

How can I get rid of that trailing "junk" "]}" so also the last event gets properly indexed? 

Labels (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Try using SEDCMD to remove the junk characters.  Put this in props.conf.

[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//
---
If this reply helps you, Karma would be appreciated.

View solution in original post

daubsi_2
Explorer

Works perfectly! Awesome! Thanks a lot!

richgalloway
SplunkTrust
SplunkTrust

Try using SEDCMD to remove the junk characters.  Put this in props.conf.

[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//
---
If this reply helps you, Karma would be appreciated.
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...