Splunk Enterprise

Splunk DB connect issue

uagraw01
Motivator

Hello Splunkers!!

I am facing one issue while data getting ingested from DB connect plugin to Splunk. I have mentioned scenarios below. I need your help to fixing it.

In DB connect, I obtain this value at the latest with the STATUS value "FINISHED".

uagraw01_0-1732594711272.png

However, when the events come into Splunk, getting the values with the STATUS value "RELEASED" without latest timestamp (UPDATED)

uagraw01_1-1732594711278.png

What I am doing so far:

I am using rising column method to get the data into Splunk to avoid duplicate in ingestion.

uagraw01_2-1732594734816.png

 

uagraw01_3-1732594745903.png

 

Labels (1)
0 Karma

uagraw01
Motivator

My database contains two types of events, and I want to ensure that only the latest row for each unique TASKID is ingested into Splunk with the following requirements:

  1. Latest Status: Only the most recent status for each TASKID should be captured, determined by the UPDATED timestamp field.
  2. Latest Date: The row with the most recent UPDATED timestamp for each TASKID should be ingested into Splunk.
  3. Single Count: Each TASKID should appear only once in Splunk, with no duplicates or older rows included.


Please help me achieve this requirement. Currently method I am using is "Rising column update" method. But still splunk is not ingesting a row with the latest status.

I am using below query in SQL input under DB connect.

SELECT *
FROM "DB"."KSF_OVERVIEW"
WHERE TASKIDUPDATED > ?
ORDER BY TASKIDUPDATED ASC

 

Below are the sample events from the database.

=====Status "FINISHED"

2024-12-06 11:50:22.984, TASKID="11933815411", TASKLABEL="11933815411", TASKIDUPDATED="11933815411 2024/12/05 19:40:47", TASKTYPEKEY="PACKGROUP", CREATED="2024-12-05 14:18:18", UPDATED="2024-12-05 19:40:47", STATUSTEXTKEY="Dynamic|TaskStatus.key{FINISHED}.textKey", CONTROLLERSTATUSTEXTKEY="Dynamic|TaskControllerStatus.taskTypeKey{PACKGROUP},key{EXECUTED}.textKey", STATUS="FINISHED", CONTROLLERSTATUS="EXECUTED", REQUIREDFINISHTIME="2024-12-06 00:00:00", STATION="PAL/Pal02", REQUIRESCUBING="0", REQUIRESQUALITYCONTROL="0", PICKINGSUBTASKCOUNT="40", TASKTYPETEXTKEY="Dynamic|TaskType.Key{PACKGROUP}.textKey", OPERATOR="1", MARSHALLINGTIME="2024-12-06 06:30:00", TSU="340447278164799274", FMBARCODE="WMC000000000341785", TSUTYPE="KKP", TOURNUMBER="2820007682", TYPE="DELIVERY", DELIVERYNUMBER="17620759", DELIVERYORDERNUMBER="3372948211", SVSSTATUS="DE_FINISHED", STORENUMBER="0000002590", STACK="11933816382", POSITION="Bottom", LCTRAINID="11935892717", MARSHALLINGAREA="WAB"

=====Status "RELEASED"

2024-12-05 14:20:13.290, TASKID="11933815411", TASKLABEL="11933815411", TASKIDUPDATED="11933815411 2024/12/05 14:18:20", TASKTYPEKEY="PACKGROUP", CREATED="2024-12-05 14:18:18", UPDATED="2024-12-05 14:18:20", STATUSTEXTKEY="Dynamic|TaskStatus.key{RELEASED}.textKey", CONTROLLERSTATUSTEXTKEY="Dynamic|TaskControllerStatus.taskTypeKey{PACKGROUP},key{CREATED}.textKey", STATUS="RELEASED", CONTROLLERSTATUS="CREATED", REQUIREDFINISHTIME="2024-12-06 00:00:00", REQUIRESCUBING="0", REQUIRESQUALITYCONTROL="0", PICKINGSUBTASKCOUNT="40", TASKTYPETEXTKEY="Dynamic|TaskType.Key{PACKGROUP}.textKey", OPERATOR="1", MARSHALLINGTIME="2024-12-06 06:30:00", TSUTYPE="KKP", TOURNUMBER="2820007682", TYPE="DELIVERY", DELIVERYNUMBER="17620759", DELIVERYORDERNUMBER="3372948211", SVSSTATUS="DE_CREATED", STORENUMBER="0000002590", STACK="11933816382", POSITION="Bottom", MARSHALLINGAREA="WAB"

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Again - there is no way to update an existing event within Splunk. So you can't have only the latest status. As simple as that.

You can try to walk around that by maybe ingesting the state periodically and hold the state in a lookup or something similar but this approach doesn't scale well.

0 Karma

uagraw01
Motivator

@PickleRick I am using field name "TASKIDUPDATED" which is the combination of TASKID and UPDATED column and it is always dynamic in nature. I have given this field in the rising column and this field is changing in every run. Even after this, duplicate data is being ingested.

0 Karma

uagraw01
Motivator

@PickleRick I mean to say. The value of the TASKIDUPDATED field is always unique value after applying checkpoint value event should be ingested only once and not multiple times.  

Below is the setting I am currently using for db connect. 

connection = VIn
disabled = 0
index = group_data
index_time_mode = current
interval = */10 * * * *
max_rows = 0
mode = rising
query = SELECT * FROM "WMCDB"."KLDGSF_ROUPOVERVIEW"\
WHERE TASKIDUPDATED < ?\
ORDER BY TASKIDUPDATED DESC
query_timeout = 30
sourcetype = overview_packgroup
tail_rising_column_init_ckpt_value = {"value":null,"columnType":null}
tail_rising_column_name = TASKIDUPDATED
tail_rising_column_number = 3
input_timestamp_column_number = 10
input_timestamp_format =
0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. Let's back up a little.

You have a record with

TASKID=1
UPDATED=1
VALUE="A"
TASKIDUPDATED="1-1"

You update the VALUE and the UPDATED field and the TASKIDUPDATED field is updated as well so you have

TASKID=1
UPDATED=2
VALUE="B"
TASKIDUPDATED="1-2"

From Splunk's point of view it's a completely different entity since your TASKIDUPDATED changed (even though from your database point of view it can still be the same record). Splunk doesn't care about state of your database. It just fetches some results from database query.

You can - to some extent - compare it to the file monitor input. If you have a log file which Splunk is monitoring and you change some sequence of bytes in the middle of that file to a different sequence, Splunk has no way of knowing that something changed - the event which had been read from that position and ingested into Splunk stays the same. (of course there can be issues when Splunk notices file that file has been truncated and decides to reread whole file or just stops reading from the file because it decides it reached the end of the file but these are beside the main point).

BTW, remember that setting a non-numeric column to bee your rising column may yield unpredictible results due to quirkness of sorting.

EDIT warning - previous version of this reply mistakenly used the same field name twice.

PickleRick
SplunkTrust
SplunkTrust

Whay do you mean by "duplicate" in this context? Two different values for the same TASKID? That's expected.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Most probably your DB query initially returned one status which got ingested from the input but later something within your DB changed the status. But since the TASKID is the primary identifier for the ingested records, the same TASKID will not be ingested again. Hence the discrepancy between the DB contents and the indexed data.

0 Karma

uagraw01
Motivator

Hi @PickleRickIf I replace the TASKID column with UPDATED column to rising column method, will it make a difference? 

FYI : I also increased the checkpoint value from 1 to 2 and even after the second time STATUS change is RELEASED to FINISHED, that row is not ingested in splunk.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It's not about a field but more about the general layout and variability of data in your DB. Splunk works differently - once you ingest an event, it's immutable whereas the contents of a particular row in DB can change. So regardless of how you decide that one row of your results has already been ingested, it won't be ingested again even if some "secondary" fields change their values.

I don't know your data, I don't know what it represents. If you reconfigure your DB data onboarding process to ingest both states of your DB record (or whatever result set you're getting), you'll have in Splunk two separate  partly duplicated events and will have to handle it somehow in search-time.

uagraw01
Motivator

Is someone can support me on this topic ?

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...