We have an asset management system/database that's the center at alot of what we do where I work.
Splunk is at the center of what we do regarding PCI logging.
What we're trying to do is script the automation of listing information about when a host reported into splunk and populate our asset management DB.
Currently our PCI teams logs into splunk and does ad-hoc searches to see if hosts are sending logs to splunk. We'd like to just send them to our asset management database front end. They would find a host in the database, and part of the information supplied would be a log source, the last time that log was found, and the index it was found in.
We're currently doing this daily via a saved search:
earliest=-1d | stats max(_time) BY host, source, index, sourcetype
then parsing the data from that search and loading the results into the database. Since we have terabytes of data to sift through though the search takes hours and hours of course(currently anywhere from 7-10 hours).
We've been going over in our heads a faster way of getting that search to run in splunk. As it stands on some days Any ideas?
metadata is too limited for what you're trying to do - which is possible because it keeps counts and times by host/index/sourcetype independently, but not further broken down. In other words, it's hard to use
metadata to figure out the most recent event for a given sourcetype on a given host.
This is one of those places where a summary index may work, but it might be clunky. A better solution might be to use a lookup table to maintain your state. There is a Splunk blog post that covers this technique at http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/ . The basic premise is that a scheduled search (which runs very fast) incrementally updates a lookup table with newer data. Then, you can look directly at the lookup table, which should only be as "old" as the most recent scheduled update.