How to align DB import processing with DBConnect

nishida_tada_ca · ‎02-04-2020

DBConect retrieves data from multiple tables.
Regarding the acquisition, the processing time is shifted considering the load on the DB.
Also, as for the acquisition method, I want to take only the updated amount by looking at the ID, so I set Rising Input Type
The following SQL is executed.

SELECT * FROM "zaif". "Public". "Table name"
WHERE id>?
ORDER BY id ASC

However, from the development side
There was a request that even if the import process was shifted, the end of the data imported that day should be the same.
For example, 5:10, 5:20, 5:30 even if you start the capture process sequentially,
I want to finish all data at 5:00.
Also on the next day, we want to get the minutes from 5:00 on the previous day (so that there are no missing updates).
In that case, please teach us how to process the import.

Richfez · ‎02-04-2020

Your question isn't quite as clear as I think you think it is, but that's OK, I think we can answer a few ways and one of these will probably work for you. 🙂

First, clarification:
It feels like the point of moving/shifting the data collection is to prevent slowness in the DB during business hours. Is this correct? Or is there some other reason? Then, to prevent that slowness, the proposed solution is to only collect data after business is closed (or after hours when it's not as busy - whatever). So do you want to collect the data one time every day at 5 PM? Or did I completely misread this?

A couple of thoughts then:

If you look at the spec file for db_inputs.conf you'll see that this entry:

interval = <integer|string>
# required
# interval to fetch data from DB and index them in Splunk
# It could be a number of seconds or a cron expression

That last option is one key item. If you want to shift the db input to only running after hours, then you can use a cron schedule.
If you just want to run one time at 5:00 PM to collect everything from the last time the input ran (more on this later) to 5:00 PM, the cron expression could be as simple as 0 17 * * * which would, at 0 minutes past 1700, run one time. If instead you want it to run every ten minutes between 5 PM and midnight, */10 17-24 * * *. Google 'cron entries' and look for a cron calculator and some descriptions, they'll help you figure your own out.

Now, on to the next question, which is about collecting from only 5, or 5:10, or whatever, yesterday. Well, this has an easy answer. You are using a rising column. When the input runs, it notes the largest of your rising column values id and stores it. The next time it runs - whether that be 3 minutes from now, or 3 days from now - when it collects data it will collect all new data with an id greater than the previous last ingested id. Hence why your SQL query that you are pulling data with has WHERE id>? in the middle.

So to that end, it'll all "just work".

Hopefully this helps, if it does please mark this as accepted. If it does not, reply back with more specifics and I (or we) can clarify some points, or go over some detail more thoroughly!

Happy Splunking,
Rich

How to align DB import processing with DBConnect

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

Join the Conversation

How to align DB import processing with DBConnect

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!