Hi Everyone,
I am sending data to snowflake. I send data every 15 mins. In DB Connect output search I make sure I add extra 5 mins for additional data if there is a data loss. During peak hours, when data is more than 2000 rows, I see data missing in snowflake. I do upsert to dedup the duplicates as our TransactionId is unique. Our query generates time in the format:05/06/2026 12:01:57.786 AM EDT but I convert it to 2026-05-06 00:01:57.786 -0400 to accommodate snowflake time requirement. Here is the transformation I use to accommodate it:
| eval DateTime_ET_clean = DateTime_ET
| eval DateTime_ET_clean = replace(DateTime_ET_clean, " EDT$", " -04:00")
| eval DateTime_ET_clean = replace(DateTime_ET_clean, " EST$", " -05:00")
| eval DateTime_ET_epoch = strptime(DateTime_ET_clean, "%m/%d/%Y %I:%M:%S.%3N %p %z")
| eval DateTime_ET_SF = strftime(DateTime_ET_epoch, "%Y-%m-%d %H:%M:%S.%3N %z")
After all these, I still see 20-30% data loss. We are not sure how to make sure all the data is propagated. Out snowflake warehouse is XS in size.
Are you able to increase the size of your Snowflake deployment to rule this out? It sounds like perhaps Snowflake isnt processing the events being sent? Are there any logs in either the Snowflake environment or in _internal in Splunk which suggest any errors sending the data?
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
@livehybrid We have cloud instance, so no changes can be made there. With regards to snowflake we are not observing any errors.
This is happening only when we 'Upsert'. Tested without 'Upsert' no issues of dropping so far. We have XS warehouse and it is pretty fast. Only issue is we need to right a script to remove duplicates in snowflake.