Deployment Architecture

Issue Forwarding Cloud Custodian Logs Using a Splunk Universal Forwarder

gerrykahn
Explorer

I have a developer running Cloud Custodian scans in AWS and dropping the JSON results on a Linux box running a Splunk Universal Forwarder. The results go into a file hierarchy: Out/BU_name/ORG_name/TypeOfScan_name/Results

I installed a Splunk UF on the box and set it up monitor the Out directory and all the sub-directories.

The problem is because there are many BUs, each with several ORGs and all running 5 different types of scans I end up with several hundred files with exactly the same name in hundreds of sub-directories. And to make matters worse the scan reruns every 10 minutes and the output file goes in the same location and has the same name, just the time stamp is updated.
I have tried many configuration and none have worked.

My latest attempted inputs.conf:
[monitor:///home/cloud-user/out/]
disabled = false
index = aws_scan
sourcetype = cloudcustodian
recursive = true
crcSalt =
initCrcLength = 1048576

Has anyone faced a similar issue and found a solution?

Tags (1)
0 Karma
1 Solution

masonmorales
Influencer

You can have Splunk recurse through directories by using "..." in the stanza. e.g.:

[monitor:///home/cloud-user/out/.../nameoflogfile.log]
disabled = false
index = aws_scan
sourcetype = cloudcustodian
recursive = true

Make sure that whatever use splunkd is running as has permissions to those files. You might want to take a look at how many file descriptors are in use as well and ensure that there are enough configured to monitor all of those files.

View solution in original post

0 Karma

masonmorales
Influencer

BTW, why do you have initCrcLength set? Do the files have very long headers?

0 Karma

sloshburch
Ultra Champion

Would you elaborate on this:

I end up with several hundred files with exactly the same name in hundreds of sub-directories. And to make matters worse the scan reruns every 10 minutes and the output file goes in the same location and has the same name, just the time stamp is updated.

It could mean a few different things. Do the files show up in splunk with the same value for source or just the filename part of the source is the same? What specifically is "exactly the same name" mean?
Is the scan that runs every 10min a process that produces the outputs in these locations? What happens to the files that were there already after the scan is run? Does the scan append or replace or roll the existing logs? Sounds like it replaces the file in which case you've got a case of a log file whose cursor is at a point in the file that no longer exists because Splunk didn't realize the file is actually new (it assumes the file is appended to).

Clarify those and we'll see where to go next.

0 Karma

masonmorales
Influencer

You can have Splunk recurse through directories by using "..." in the stanza. e.g.:

[monitor:///home/cloud-user/out/.../nameoflogfile.log]
disabled = false
index = aws_scan
sourcetype = cloudcustodian
recursive = true

Make sure that whatever use splunkd is running as has permissions to those files. You might want to take a look at how many file descriptors are in use as well and ensure that there are enough configured to monitor all of those files.

0 Karma

gerrykahn
Explorer

I tried what has been suggested but I still have two issues. The first is that I am getting each line of a JSON file a an individual log event. The second is that I am not getting most of the files. I suspect the UF thinks it has indexed the files already. That is why I had been trying things like "crcsalt = " and initCrcLength = 1048576. Is there anything else you would suggest I try?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...