Solved: Splunk DB Connect 2: Is there a way to tell how ma...

bworrellZP · ‎10-14-2015

Using Splunk DB Connect 2 to pull in data from SQL. Noticed a few things wrong with the data, so made some adjustments, which I then noticed other issues.

So issue one, I am importing data from a server that syncs its data from prod. Due to this, I may be missing data. So, is there a way to tell how many records that DB Connect imported per poll? Is there a way to go back and import any missing records?

If that is an error, is there a way to run DB Connect at a certain time of day to pull all new data since yesterday?

Of potential connection, I moved our rising column to use DateofAccess, versus another column. Read that this is a bad idea. The post did not say why though. Is it okay to use that or should I use a different column?

Thanks
Brian

Richfez · ‎10-14-2015

So serverB syncs data from production's DB, then you use DB Connect v2 to grab the data from serverB. But serverB's copy of the data is incomplete? Your description of how it's missing isn't very clear and you may need to clarify if nothing we come up with fixes this, but still, I have a few thoughts.

Fix your syncing. If serverB's copy of the data is complete up to a certain time, this isn't an issue and it shouldn't be an issue for Splunk, either. If, though, serverB's copy of the data has "gaps" in the inside, then there's something wrong with your method of syncing and that should be fixed.

If there are gaps like those described above that you can't fix with better/different syncing, there are two ways I can think of to get around that. Delete all your data each time and have the input into Splunk always grab the entire table each time. This can be brutal to a license, though.

Another thought would be to have the destination tables have a date/time auto-filled column that sets to the time of row creation in the synced DB, then you can use that column as your rising column. You might need to work with your DBA to get that sorted out. The idea being each row, when it gets added to the synced/destination DB, has that field fill in with the current time when that row got added. Then you pull based on that as your rising column so it will, each time, only grab items that are "new" to the synced DB.

Alternatively, why can't you use a read-only connection straight to the production DB? I mean, sometimes you can't, but sometimes the only reason you can't is because you haven't asked. 🙂

View solution in original post

Richfez · ‎10-14-2015

So serverB syncs data from production's DB, then you use DB Connect v2 to grab the data from serverB. But serverB's copy of the data is incomplete? Your description of how it's missing isn't very clear and you may need to clarify if nothing we come up with fixes this, but still, I have a few thoughts.

Fix your syncing. If serverB's copy of the data is complete up to a certain time, this isn't an issue and it shouldn't be an issue for Splunk, either. If, though, serverB's copy of the data has "gaps" in the inside, then there's something wrong with your method of syncing and that should be fixed.

If there are gaps like those described above that you can't fix with better/different syncing, there are two ways I can think of to get around that. Delete all your data each time and have the input into Splunk always grab the entire table each time. This can be brutal to a license, though.

Another thought would be to have the destination tables have a date/time auto-filled column that sets to the time of row creation in the synced DB, then you can use that column as your rising column. You might need to work with your DBA to get that sorted out. The idea being each row, when it gets added to the synced/destination DB, has that field fill in with the current time when that row got added. Then you pull based on that as your rising column so it will, each time, only grab items that are "new" to the synced DB.

Alternatively, why can't you use a read-only connection straight to the production DB? I mean, sometimes you can't, but sometimes the only reason you can't is because you haven't asked. 🙂

bworrellZP · ‎10-15-2015

Good points. So with the Prod server, its a Data warehouse, doing a lot of things, pulling data from lots of other places, jobs, etc. With that load, they want to sync data to the Read Only Data warehouse. My gut tells me the sync job does not replicate in date time order but in some other blocks. They sync job seems to take different lengths of time to complete.

Thinking I should change my import to a cron job, and pull data with query that is d-1. then I should not have an issue.

Richfez · ‎10-20-2015

If you don't need absolutely up to the moment data, doing "laggy" import like that could certainly do the trick.

Let us know what you finally figure out!

bworrellZP · ‎10-21-2015

Did testing yesterday and had a job run this morning via Cron. Appears that I may always get some duplicate data, due to how its pulled to the warehouse. Talking maybe 20 to 1000 records out of 250k pulled in per night. That said since there is a unique key, I can just dedup it on the searches and dashboards. The Cron worked though.

Splunk DB Connect 2: Is there a way to tell how many records were imported per poll, and can we go back an import missing records?

CX Day is Coming!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Are you a member of the Splunk Community?

Splunk DB Connect 2: Is there a way to tell how many records were imported per poll, and can we go back an import missing records?

CX Day is Coming!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console