Solved: DB Connect rising column + status field to handle ...

derekjm · ‎12-09-2018

Hi Peeps,
I'm a newbie trialing Splunk. I have searched the forum. Apologies if this is a duplicate.

I have an application that logs events to a database. It does use an incremental counter as the primary key - making it ideal to use the Rising Column import into Splunk.

However, the events logged have a Status field which is binary - running or completed. The application will log an event as running, and after the event completes (as much as 10 - 15 minutes after commencing) it will update all the events and set the Status field to completed. Unfortunately, the application will also default a "Run Successfully" field to false until it completes and will only set the "Run Successfully" field to true at event completion.

I only want Splunk to retrieve the events that have a Status = completed.

If I just use the Rising Column method - Splunk will capture events not completed, meaning I will be creating false alerts if I raise alerts based on the "Run Successfully" field.

There is a date time field, however after reading articles like https://answers.splunk.com/answers/400221/why-is-using-a-timestamp-column-for-the-rising-col.html and other answers that ask you to "dedup" searches - I dont like the idea of using the date time stamp as the Rising Column and would like to avoid it if possible.

If I use the Status field in the SQL WHERE clause, Splunk will still increment the Rising Column it is looking for and so will miss the update. I did see an article mentioning the use of a lookup, but there were no details on how to do so.

derekjm · ‎12-16-2018

Thanks jcoates and richgalloway for all the suggestions.
In the end, I just needed to review my data carefully and think it through.
I have another field - Run End, which tells me when the long running job finishes. So I now use that as the Rising Column. This means that I now dont pick up the default state "Run Successfully" = false. I now only pick up the true run success state as it updates when it updates the Run End field.
I have also managed to create my first alert, also using the dedup command.
I have a different question now, which I will ask in a new thread.

View solution in original post

derekjm · ‎12-16-2018

Thanks jcoates and richgalloway for all the suggestions.
In the end, I just needed to review my data carefully and think it through.
I have another field - Run End, which tells me when the long running job finishes. So I now use that as the Rising Column. This means that I now dont pick up the default state "Run Successfully" = false. I now only pick up the true run success state as it updates when it updates the Run End field.
I have also managed to create my first alert, also using the dedup command.
I have a different question now, which I will ask in a new thread.

jcoates · ‎12-10-2018

I would start with

Select columns from table where status=complete and date>$risingcol

That requires the status incomplete to status complete transition to be one-way and permanent, but it would allow you to use time for rising relatively safely (given the caveats in the documentation).

If that’s not true, you’ll need to batch the whole table in regularly and use a lookup to manage a view from a couple of Splunk saved searches. In addition to high complexity, this could be a non-starter in terms of license impact if the table is large and collection is frequent. The lookup thing could also hurt performance if you have a lot of volume, as you’re effectively storing the desired subset of the original table in Splunk’s memory or as a KV store table in Splunk.

derekjm · ‎12-10-2018

I was hoping to avoid using the date as a Rising Column.
Batching the entire table is not feasible. The reason we want it in Splunk in the first place is because it is an ever growing log that needs to be queried for health checks and doing so on the live table would hinder performance.
Right now because the Rising Column is the Primary Key we can get away with extracting data regularly. Changing to the datetime column means no longer using an index and a table scan will be used, hindering performance.

richgalloway · ‎12-11-2018

While there are good reasons to avoid using timestamps as the rising column, sometimes it's the best option. It depends on the likelihood of a timestamp collision. Is it possible to add an index on the timestamp column?

---
If this reply helps you, Karma would be appreciated.

derekjm · ‎12-11-2018

Yeah seems no other option.
I cant alter the database as it belongs to a 3rd party vendor.

My issue now will be that I will have duplicate entries, some with "Run Successfully" true, some false. I will now need to write SPL to ignore "Run Successfully" = false IF there exists rows with "Run Successfully" = true at a later timestamp for a given Primary Key Id, and only alert if there are ONLY entries for "Run Successfully" = false. How do I do that?!?

richgalloway · ‎12-14-2018

If you add | dedup PrimaryKeyId to your search, you'll get the most recent event for each Primary Key Id. Of course, you'll need to replace "PrimaryKeyId" with the real name of the field that holds the primary key.

---
If this reply helps you, Karma would be appreciated.

DB Connect rising column + status field to handle UPDATE to existing rows?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!