I would like to manually import AWS Cloudtrail logs which were stored as GZipped JSON Files on S3. Those files reside on my local disk, one file per hour, per day, etc. They should not be imported directly from the Cloud, e.g. via the Splunk TA for AWS.
I have installed that app though to get all the various AWS specific sourcetypes.
The problem with the data is, that those files contain only a single line of data which is a huge JSON array containing all the individual events. This is apparently not too seldom, so I am refering to AWS Cloudtrail only for the sake of providing an example of this format.
Small artificially contrived example how such a file looks like, here 3 events.
{"Records":[{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:01Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:03Z","userIdentity":{"type":"AssumedRole"}},{"eventVersion":"1.08","eventTime":"2022-06-08T22:10:05Z","userIdentity":{"type":"AssumedRole"}}]}
Of course the real Cloudtrail events are much more talkative and layered, but all this does not pose a problem for that case here.
Selecting the Sourcetype " aws:cloudtrail" does not properly split the events. I changed the LINE_BREAKER to the following value:
LINE_BREAKER=((\{"Records":\[)*|,*){"eventVersion"Using this I was able to properly index all the events and even get rid of the header upfront. However, the very last event still gets wrongly indexed, as it ends with the closing " ]}" from the opening/wrapping "Records" element and as such it's not proper JSON.
How can I get rid of that trailing "junk" "]}" so also the last event gets properly indexed?
Try using SEDCMD to remove the junk characters. Put this in props.conf.
[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//
Works perfectly! Awesome! Thanks a lot!
Try using SEDCMD to remove the junk characters. Put this in props.conf.
[aws:cloudtrail:fromS3]
SEDCMD-nojunk = s/]}$//