I am both new to splunk and not a database expert. So I'm asking for your help...
What should I use, Batch Input or Rising Column? From reading the descriptions, it seems to me like a combination of both are needed. Here is my case:
The database has tons of tables, most of these have a generally fixed number of rows (only adds a row when a new member or application is added for example), and the existing rows are updated/modified frequently. The other tables (less tables, but contain the bulk of the data) have new rows added very frequently, but existing rows are never updated/modified.
What is the best way to get this from my MS SQL Server to splunk using DBConnect v3.0.2?
Thanks in advance for your help.
Depending on the size of the tables that are updated, it may be better to create a lookup table with the contents of those tables rather than index them. The reason for this is because if you index those tables using batch input then you will receive a lot of duplicates. If you index these tables using rising column then you will lose data. If you create a lookup tables using the tables contents then you will get a fresh copy from each pull.
For the tables that have new rows added but old rows are unmodified I would definitely use the rising column method. When you configure this, you can set the ID to initially be 0 and it will pull all of the records sequentially until it is caught up. This makes the batch input unnecessary.
My suggestion would to be use batch input for your first type of tables with fixed number of rows and data being modified. The batch input here would work as snapshot/state of the data at given time.
For tables with bulk of data but new rows getting added and old data doesn't get updated, I would suggest to use monitoring input with rising column as primary key. This way you get static rows just once and get every additions based on last known primary key value.
Depending on the size of the tables that are updated, it may be better to create a lookup table with the contents of those tables rather than index them. The reason for this is because if you index those tables using batch input then you will receive a lot of duplicates. If you index these tables using rising column then you will lose data. If you create a lookup tables using the tables contents then you will get a fresh copy from each pull.
For the tables that have new rows added but old rows are unmodified I would definitely use the rising column method. When you configure this, you can set the ID to initially be 0 and it will pull all of the records sequentially until it is caught up. This makes the batch input unnecessary.
One more question related to this. For the tables that I have which gets new rows added often, you rightfully suggested Rising Column. However, if there is a change to a previous row, if I am using rising column, will that change be then indexed in Splunk? I have a table that has application data. New applications (rows) are submitted at a rate of about 50 per day (not too bad), but I am interested in tracking an "Application Status" field that tracks approvals and denials of these applications, and that will be set at some later date. Things like statistics of Application Status and how long it took from the time we received the application to the time the status was determined is important to us.
In this case, is Rising Column still the best choice?
Thanks for that explanation. Where can I go learn more about how to create a lookup table with the content of the database tables? I am unsure how to do that.
Once you have a select query that retrieves the columns and rows you want, you can simply use the following command in splunk to output the results to a lookup table:
| dbxquery shortnames=1 connection=YOUR_CONNECTION query="SELECT COLUMNS FROM YOUR.TABLE" | fields - _* | outputlookup LOOKUP_TABLE_NAME.csv
Once everything is correct it would probably be best to save this search and then schedule it. Scheduling the search means that the lookup table will be updated periodically which is probably what you want. If this is more of a one-off thing then there is no need to schedule it but probably still a good idea to save it.
I hope this was helpful. Please let me know if there is anything else I need to clear up.
Thank you!
Thanks, that is really helpful. Then once it's done the information can be searched on the same as indexed data? (I'm new to splunk and don't yet know this, but learning everyday).
Going to try the lookup suggestion you mentioned today.
Yes, you can use the command inputlookup to search over the data in the lookup table. For example,
| inputlookup example.csv | search columnName=something
Thanks
The batch input is a bulk "download" of the table entries while the Rising Column is an "incremental" download, that´s why a combination of the two is usually configured. You start y creating a batch input and then the Rising Column one