I am using splunk_db connect version 4.2 and dealing with a table of 28 million entries with a million new entries adding every day. While creating the DB input as batch I can choose the table and choose continue to go to the next section that is 'Set Parameters'. I am worried that we can't index the whole table of 28 million entry, because under Operations --> Set Parameter section, it says "Enter an integer between 1 and 10000000.". Does this mean it can only index 10 million at one time or would it slowly catches up the next time it'll run?
This is a MySQL server. Do you think even splunk db_connect is a proper way to go or should I consider setting up a forwarder in this case? I want to make sure I am not hitting the limitations of the db connect.
I fixed the title. I didn’t get you quite well. The amount ”Enter an integer between 1 and 10000000." Is this the limit of one time fetch. If yes then the next time (depending on the frequency we set) it can fetch the same rows, this way it’d never finish the full list (specially in case of batch input). Couple of things regarding the database I am dealing with, it is increasing with rate of almost more than half a million new entries every day. My two main options are batch input Vs rising column. Let's discuss both of them.
A) Batch Input
A1: One problem is that if we pull 10K every 5 minutes, as proposed above. It will take a long time to catch-up and with the amount of logs the db is increasing (half a million/per day) this seems even more challenging.
A2: Even it completes at some point the the data would start duplicating.
B) Rising Column Input
B1: When I run the same exact query in Rising column, it takes about 7/10 minutes before throwing error like below:
External search command 'dbxquery' returned error code 1. Script output = "RuntimeError: Failed to run query: "SELECT * FROM (SELECT * from
apps) t", params: "None", caused by: Exception(" java.sql.SQLException: Incorrect key file for table 'C:\Windows\TEMP\#sql22c_2b711e_2.MYI'; try to repair it.",). "
B2: Let's say we solve the above issue, I am not sure how the limit of 10 million would affect here with the rate it is growing.
What are you trying to do here?
Your subject mentions rising column and then your post refers to batch input.
Also Splunk DB Connect V3 has been released but not 4.2.
If you use rising column then it will run on the schedule and pull upto the limit of the number of rows, for example you could pull 10K at a time and run every 5 minutes, it would take a long time to pull down 28 million rows so you might want to use a larger number as per your description.
Furthermore you mention using a forwarder on the MySQL server, what data are you trying to get in? Is it in a file or in a database table?