Solved: Properly index json files with events as elements ...

daubsi_2 · ‎07-05-2023

I would like to manually import AWS Cloudtrail logs which were stored as GZipped JSON Files on S3. Those files reside on my local disk, one file per hour, per day, etc. They should not be imported directly from the Cloud, e.g. via the Splunk TA for AWS.

I have installed that app though to get all the various AWS specific sourcetypes.

The problem with the data is, that those files contain only a single line of data which is a huge JSON array containing all the individual events. This is apparently not too seldom, so I am refering to AWS Cloudtrail only for the sake of providing an example of this format.

Small artificially contrived example how such a file looks like, here 3 events.

{"Records":[{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:01Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:03Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:05Z","userIdentity":{"type":"AssumedRole"}}]}

Of course the real Cloudtrail events are much more talkative and layered, but all this does not pose a problem for that case here.

Selecting the Sourcetype " aws:cloudtrail" does not properly split the events. I changed the LINE_BREAKER to the following value:

LINE_BREAKER=((\{"Records":\[)*|,*){"eventVersion"

Using this I was able to properly index all the events and even get rid of the header upfront. However, the very last event still gets wrongly indexed, as it ends with the closing " ]}" from the opening/wrapping "Records" element and as such it's not proper JSON.

How can I get rid of that trailing "junk" "]}" so also the last event gets properly indexed?

richgalloway · ‎07-05-2023

Try using SEDCMD to remove the junk characters. Put this in props.conf.

[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//

---
If this reply helps you, Karma would be appreciated.

View solution in original post

daubsi_2 · ‎07-06-2023

Works perfectly! Awesome! Thanks a lot!

richgalloway · ‎07-05-2023

Try using SEDCMD to remove the junk characters. Put this in props.conf.

[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//

---
If this reply helps you, Karma would be appreciated.

Properly index json files with events as elements of an array (at the example of Cloudtrail)

JSON

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Properly index json files with events as elements of an array (at the example of Cloudtrail)

JSON

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits