All Apps and Add-ons

How to align DB import processing with DBConnect

Loves-to-Learn Lots

DBConect retrieves data from multiple tables.
Regarding the acquisition, the processing time is shifted considering the load on the DB.
Also, as for the acquisition method, I want to take only the updated amount by looking at the ID, so I set Rising Input Type
The following SQL is executed.

SELECT * FROM "zaif". "Public". "Table name"
WHERE id>?

However, from the development side
There was a request that even if the import process was shifted, the end of the data imported that day should be the same.
For example, 5:10, 5:20, 5:30 even if you start the capture process sequentially,
I want to finish all data at 5:00.
Also on the next day, we want to get the minutes from 5:00 on the previous day (so that there are no missing updates).
In that case, please teach us how to process the import.

0 Karma


Your question isn't quite as clear as I think you think it is, but that's OK, I think we can answer a few ways and one of these will probably work for you. 🙂

First, clarification:
It feels like the point of moving/shifting the data collection is to prevent slowness in the DB during business hours. Is this correct? Or is there some other reason? Then, to prevent that slowness, the proposed solution is to only collect data after business is closed (or after hours when it's not as busy - whatever). So do you want to collect the data one time every day at 5 PM? Or did I completely misread this?

A couple of thoughts then:

If you look at the spec file for db_inputs.conf you'll see that this entry:

interval = <integer|string>
# required
# interval to fetch data from DB and index them in Splunk
# It could be a number of seconds or a cron expression

That last option is one key item. If you want to shift the db input to only running after hours, then you can use a cron schedule.
If you just want to run one time at 5:00 PM to collect everything from the last time the input ran (more on this later) to 5:00 PM, the cron expression could be as simple as 0 17 * * * which would, at 0 minutes past 1700, run one time. If instead you want it to run every ten minutes between 5 PM and midnight, */10 17-24 * * *. Google 'cron entries' and look for a cron calculator and some descriptions, they'll help you figure your own out.

Now, on to the next question, which is about collecting from only 5, or 5:10, or whatever, yesterday. Well, this has an easy answer. You are using a rising column. When the input runs, it notes the largest of your rising column values id and stores it. The next time it runs - whether that be 3 minutes from now, or 3 days from now - when it collects data it will collect all new data with an id greater than the previous last ingested id. Hence why your SQL query that you are pulling data with has WHERE id>? in the middle.

So to that end, it'll all "just work".

Hopefully this helps, if it does please mark this as accepted. If it does not, reply back with more specifics and I (or we) can clarify some points, or go over some detail more thoroughly!

Happy Splunking,

0 Karma
Get Updates on the Splunk Community!

Starting With Observability: OpenTelemetry Best Practices

Tech Talk Starting With Observability: OpenTelemetry Best Practices Tuesday, October 17, 2023   |  11AM PST / ...

.conf23 | Get Your Cybersecurity Defense Analyst Certification in Vegas

We’re excited to announce a new Splunk certification exam being released at .conf23! If you’re going to Las ...

Streamline Data Ingestion With Deployment Server Essentials

REGISTER NOW! Every day the list of sources Admins are responsible for gets bigger and bigger, often making ...