All Apps and Add-ons

DBConnect: Batch Input vs Rising Column

sfefcu
Path Finder

I am both new to splunk and not a database expert. So I'm asking for your help...

What should I use, Batch Input or Rising Column? From reading the descriptions, it seems to me like a combination of both are needed. Here is my case:

The database has tons of tables, most of these have a generally fixed number of rows (only adds a row when a new member or application is added for example), and the existing rows are updated/modified frequently. The other tables (less tables, but contain the bulk of the data) have new rows added very frequently, but existing rows are never updated/modified.

What is the best way to get this from my MS SQL Server to splunk using DBConnect v3.0.2?

Thanks in advance for your help.

0 Karma
1 Solution

mmeehanacademy
Explorer

Depending on the size of the tables that are updated, it may be better to create a lookup table with the contents of those tables rather than index them. The reason for this is because if you index those tables using batch input then you will receive a lot of duplicates. If you index these tables using rising column then you will lose data. If you create a lookup tables using the tables contents then you will get a fresh copy from each pull.

For the tables that have new rows added but old rows are unmodified I would definitely use the rising column method. When you configure this, you can set the ID to initially be 0 and it will pull all of the records sequentially until it is caught up. This makes the batch input unnecessary.

View solution in original post

somesoni2
Revered Legend

My suggestion would to be use batch input for your first type of tables with fixed number of rows and data being modified. The batch input here would work as snapshot/state of the data at given time.

For tables with bulk of data but new rows getting added and old data doesn't get updated, I would suggest to use monitoring input with rising column as primary key. This way you get static rows just once and get every additions based on last known primary key value.

mmeehanacademy
Explorer

Depending on the size of the tables that are updated, it may be better to create a lookup table with the contents of those tables rather than index them. The reason for this is because if you index those tables using batch input then you will receive a lot of duplicates. If you index these tables using rising column then you will lose data. If you create a lookup tables using the tables contents then you will get a fresh copy from each pull.

For the tables that have new rows added but old rows are unmodified I would definitely use the rising column method. When you configure this, you can set the ID to initially be 0 and it will pull all of the records sequentially until it is caught up. This makes the batch input unnecessary.

sfefcu
Path Finder

One more question related to this. For the tables that I have which gets new rows added often, you rightfully suggested Rising Column. However, if there is a change to a previous row, if I am using rising column, will that change be then indexed in Splunk? I have a table that has application data. New applications (rows) are submitted at a rate of about 50 per day (not too bad), but I am interested in tracking an "Application Status" field that tracks approvals and denials of these applications, and that will be set at some later date. Things like statistics of Application Status and how long it took from the time we received the application to the time the status was determined is important to us.

In this case, is Rising Column still the best choice?

0 Karma

sfefcu
Path Finder

Thanks for that explanation. Where can I go learn more about how to create a lookup table with the content of the database tables? I am unsure how to do that.

0 Karma

mmeehanacademy
Explorer

Once you have a select query that retrieves the columns and rows you want, you can simply use the following command in splunk to output the results to a lookup table:

| dbxquery shortnames=1 connection=YOUR_CONNECTION query="SELECT COLUMNS FROM YOUR.TABLE" | fields - _* | outputlookup LOOKUP_TABLE_NAME.csv

  • dbxquery: This will execute your query on the connection specified and return the results using the short names (This is my preference, feel free to remove the shortnames parameter if you would prefer the DBConnect default).
  • fields: This command is used to remove all fields starting with an underscore since they are generally un-needed. If you need these fields then please remove it from the command.
  • outputlookup: This command is responsible for writing the results from your query to the file name specified. This will place the lookup table in the lookups directory in the app that you are running the search in.

Once everything is correct it would probably be best to save this search and then schedule it. Scheduling the search means that the lookup table will be updated periodically which is probably what you want. If this is more of a one-off thing then there is no need to schedule it but probably still a good idea to save it.

I hope this was helpful. Please let me know if there is anything else I need to clear up.

Thank you!

0 Karma

sfefcu
Path Finder

Thanks, that is really helpful. Then once it's done the information can be searched on the same as indexed data? (I'm new to splunk and don't yet know this, but learning everyday).

Going to try the lookup suggestion you mentioned today.

0 Karma

mmeehanacademy
Explorer

Yes, you can use the command inputlookup to search over the data in the lookup table. For example,

| inputlookup example.csv | search columnName=something

Thanks

0 Karma

asimagu
Builder

The batch input is a bulk "download" of the table entries while the Rising Column is an "incremental" download, that´s why a combination of the two is usually configured. You start y creating a batch input and then the Rising Column one

0 Karma
Get Updates on the Splunk Community!

How to Get Started with Splunk Data Management Pipeline Builders (Edge Processor & ...

If you want to gain full control over your growing data volumes, check out Splunk’s Data Management pipeline ...

Out of the Box to Up And Running - Streamlined Observability for Your Cloud ...

  Tech Talk Streamlined Observability for Your Cloud Environment Register    Out of the Box to Up And Running ...

Splunk Smartness with Brandon Sternfield | Episode 3

Hello and welcome to another episode of "Splunk Smartness," the interview series where we explore the power of ...