I've been developing pretty complex apps to support fraud and incident investigations. The app started as auxiliary tools to read, gain insight and detect anomalies from massive and various data sources. That is, a typical Splunk usecase.
But soon, requirements regarding investigation cycle management arose. Now we have the need to implement Splunk panels to view, but also having the function to generate and store new "investigation" events, something like a ticket system in a lifecycle.
The first (and worst) approach here was to implement advanced xml panels, and queries to generate new sourcetypes for investigation events and store them with the collect command in a summary index. This has serious problems like:
Input validation. Really difficult to implement in Advanced XML. The solution, for me, is switching to web framework and program it in MVC models with Java, pythong, django, etc.
Transactional datamodels query performance. The resulting investigation data generated and indexed in the summary index are usually many events of the same investigation, where the newest event of each investigation is the current investigation state, and thus, the one to be actually used in the apps. Remember, no updating when using collect in Splunk. Any change will imply a new event collected. Not bad taking into account audit trail enforcements and so, but is a performance pain to retrieve eeeevery event in the query only to pipe something like | dedup investigation_id sortby - _time just next. Here I've been searching some kind of param of the search command (first one in the query) to take only ONE (the most recent) event by some criteria. Still no luck... but any idea or alternative would be appreciated.
Atomicity of collect queries. Splunk's collect is not intended to be an SQL INSERT query, and so, it does not provide ACID security controls, like atomicity. The problem here is that in really complex data generation, atomicity is needed. For example, imagine that an investigation update, not only generates an event for actually updating the investigation, but also to update another sourcetype like confirmed fraud files or so. I'm implementing this with two consecutive collects, but if the first collect's query is OK, but the second one fails (or browser is closed, or whatever), then the indexed data for that investigation will be corrupted. No rollback: the first collect cannot be undone.
This last issue got me thinking about better approaches than using collects and summary indexes. The second approach, and the most obvious too, is to use a secondary SQL database with DB Connect to store these investigation records. It would provide ACID security controls, and probably more speedy response than SPL queries to Splunk indexes. But before planning to implement and deploy this alternative, I would like to ask all you about opinions for better alternatives or other ways to do it, solving all the issues I stated above.
What do you think? What would be your better approach? Is auxiliary SQL the way to go?
PD: in my opinion, csv and kvstore are not considered to be good for this because it may be unreliable, little security, config bundle size problems prone-to, etc.
... View more