Solved: Re: DBConnect: Batch Input vs Rising Column

sfefcu · ‎04-07-2017

I am both new to splunk and not a database expert. So I'm asking for your help...

What should I use, Batch Input or Rising Column? From reading the descriptions, it seems to me like a combination of both are needed. Here is my case:

The database has tons of tables, most of these have a generally fixed number of rows (only adds a row when a new member or application is added for example), and the existing rows are updated/modified frequently. The other tables (less tables, but contain the bulk of the data) have new rows added very frequently, but existing rows are never updated/modified.

What is the best way to get this from my MS SQL Server to splunk using DBConnect v3.0.2?

Thanks in advance for your help.

mmeehanacademy · ‎04-07-2017

Depending on the size of the tables that are updated, it may be better to create a lookup table with the contents of those tables rather than index them. The reason for this is because if you index those tables using batch input then you will receive a lot of duplicates. If you index these tables using rising column then you will lose data. If you create a lookup tables using the tables contents then you will get a fresh copy from each pull.

For the tables that have new rows added but old rows are unmodified I would definitely use the rising column method. When you configure this, you can set the ID to initially be 0 and it will pull all of the records sequentially until it is caught up. This makes the batch input unnecessary.

View solution in original post

somesoni2 · ‎04-07-2017

My suggestion would to be use batch input for your first type of tables with fixed number of rows and data being modified. The batch input here would work as snapshot/state of the data at given time.

For tables with bulk of data but new rows getting added and old data doesn't get updated, I would suggest to use monitoring input with rising column as primary key. This way you get static rows just once and get every additions based on last known primary key value.

mmeehanacademy · ‎04-07-2017

Depending on the size of the tables that are updated, it may be better to create a lookup table with the contents of those tables rather than index them. The reason for this is because if you index those tables using batch input then you will receive a lot of duplicates. If you index these tables using rising column then you will lose data. If you create a lookup tables using the tables contents then you will get a fresh copy from each pull.

For the tables that have new rows added but old rows are unmodified I would definitely use the rising column method. When you configure this, you can set the ID to initially be 0 and it will pull all of the records sequentially until it is caught up. This makes the batch input unnecessary.

sfefcu · ‎04-13-2017

One more question related to this. For the tables that I have which gets new rows added often, you rightfully suggested Rising Column. However, if there is a change to a previous row, if I am using rising column, will that change be then indexed in Splunk? I have a table that has application data. New applications (rows) are submitted at a rate of about 50 per day (not too bad), but I am interested in tracking an "Application Status" field that tracks approvals and denials of these applications, and that will be set at some later date. Things like statistics of Application Status and how long it took from the time we received the application to the time the status was determined is important to us.

In this case, is Rising Column still the best choice?

sfefcu · ‎04-07-2017

Thanks for that explanation. Where can I go learn more about how to create a lookup table with the content of the database tables? I am unsure how to do that.

mmeehanacademy · ‎04-07-2017

Once you have a select query that retrieves the columns and rows you want, you can simply use the following command in splunk to output the results to a lookup table:

| dbxquery shortnames=1 connection=YOUR_CONNECTION query="SELECT COLUMNS FROM YOUR.TABLE" | fields - _* | outputlookup LOOKUP_TABLE_NAME.csv

dbxquery: This will execute your query on the connection specified and return the results using the short names (This is my preference, feel free to remove the shortnames parameter if you would prefer the DBConnect default).
fields: This command is used to remove all fields starting with an underscore since they are generally un-needed. If you need these fields then please remove it from the command.
outputlookup: This command is responsible for writing the results from your query to the file name specified. This will place the lookup table in the lookups directory in the app that you are running the search in.

Once everything is correct it would probably be best to save this search and then schedule it. Scheduling the search means that the lookup table will be updated periodically which is probably what you want. If this is more of a one-off thing then there is no need to schedule it but probably still a good idea to save it.

I hope this was helpful. Please let me know if there is anything else I need to clear up.

Thank you!

sfefcu · ‎04-07-2017

Thanks, that is really helpful. Then once it's done the information can be searched on the same as indexed data? (I'm new to splunk and don't yet know this, but learning everyday).

Going to try the lookup suggestion you mentioned today.

mmeehanacademy · ‎04-10-2017

Yes, you can use the command inputlookup to search over the data in the lookup table. For example,

| inputlookup example.csv | search columnName=something

Thanks

asimagu · ‎04-07-2017

The batch input is a bulk "download" of the table entries while the Rising Column is an "incremental" download, that´s why a combination of the two is usually configured. You start y creating a batch input and then the Rising Column one

DBConnect: Batch Input vs Rising Column

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

Your summer travels continue with new course releases

Are you a member of the Splunk Community?

DBConnect: Batch Input vs Rising Column

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

Your summer travels continue with new course releases