For a variety of reasons I'm not able to push all of our syslog data to splunk. I can, however, easily generate daily logwatch reports which can either be placed into a directory, emailed, or whatever is needed to get them over to the splunk server. That part is easy. What I'm not sure of is how to go about getting splunk to eat each report as a single report and be able to generate useful reports on the data.
To put it a bit simpler, I get about 100 or so logwatch reports right now and that number is just increasing. What I'd like to do is use splunk to process these reports and generate a unified report with some basic statistics and a section for "these things are broke/unknown" so I can scan one report instead of 100.
Has anyone done this? Any hints on where to begin to get this implemented?
If that were my report format, and this were my problem, I would write a script that cuts up the reports, and creates a logfile for each category (logwatch_pam_auth.log), manually parsing the Date Range Processed for each report, and inserting the date between each entry.
At that point getting them as a set of events in splunk is easy.
There are nonscript approaches to getting the events cut up in splunk as well, but might require creating a custom datetime.xml and complex rules for event parsing. However, I suspect giving each data category its own sourcetype will greatly aid in sane field extraction.
Mostly though, you're using a tool that's designed to manage a single system, and isn't built for managing systems in aggregate, and struggling with that. Maybe there are better tools?
Personally I recommend Splunk for this, syslog datarates are usually low enough that it's not a real issue. Any sort of application or appliance data is usually vastly larger. However there are specialized tools which process log data to produce aggrate information. For example http://ossec.net