Solved: Splunk DB Connect - Tail input not updating

ezempel · ‎04-19-2013

I'm connecting to an Oracle database using a tail input. I've gotten it all working, however, splunk shows that the last update is yesterday afternoon when I initially set it up. I know there are new rows out there. In looking at the debug log, I found this:

monsch1:INFO:Scheduler - Execution of input=[dbmon-tail://Scap Monitor/Scap Tail monitor with convert] finished in duration=32 ms with resultCount=0 success=true continueMonitoring=true

So it looks like it is checking for updates but not finding any? I'm using a PROCESS_ID column as my rising column - this column is updated a primary sequential key.

ezempel · ‎04-22-2013

I managed to get around this, but it may be more of a workaround than fix. I modified my query to only look at events with a PROCESS_ID greater than 100000. This results in a smaller set that splunk doesn't have issues with. I believe this is a bug - splunk seems to only be able to process 99999 events at a time. The working stanza is as follows:

[dbmon-tail://Scap Monitor/Scap Tail monitor with convert]
    host = host-113.subnet-110.med.umich.edu
    index = default
    interval = auto
    output.format = kv
    output.timestamp = 0
    output.timestamp.column = START_DT
    output.timestamp.format = Dy DD-Mon-YYYY HH24:MI:SS
    query = select process_ID, process_name, to_char(start_dt, 'Dy DD-Mon-YYYY HH24:MI:SS') AS starttime, to_char(end_dt, 'Dy DD-Mon-YYYY HH24:MI:SS') AS endtime, STATUS, mesg from AIS_PROCESS_LOG {{WHERE $rising_column$ > ?}} AND PROCESS_ID > 100000
    table = AIS_PROCESS_LOG
    tail.rising.column = PROCESS_ID

View solution in original post

rettops · ‎04-22-2013

One thing that may be complicating your debugging (it did for me) is that the 'rising column' value is stored in $SPLUNK_DB/persistentstorage/dbx//state.xml. That state.xml file is not reset when you clean the index, or even if you delete and recreate the index. If you don't manually go in and toss that file, dbx will refuse to index anything that has a rising column value lower than the number given in the state.xml.

ezempel · ‎04-22-2013

I managed to get around this, but it may be more of a workaround than fix. I modified my query to only look at events with a PROCESS_ID greater than 100000. This results in a smaller set that splunk doesn't have issues with. I believe this is a bug - splunk seems to only be able to process 99999 events at a time. The working stanza is as follows:

[dbmon-tail://Scap Monitor/Scap Tail monitor with convert]
    host = host-113.subnet-110.med.umich.edu
    index = default
    interval = auto
    output.format = kv
    output.timestamp = 0
    output.timestamp.column = START_DT
    output.timestamp.format = Dy DD-Mon-YYYY HH24:MI:SS
    query = select process_ID, process_name, to_char(start_dt, 'Dy DD-Mon-YYYY HH24:MI:SS') AS starttime, to_char(end_dt, 'Dy DD-Mon-YYYY HH24:MI:SS') AS endtime, STATUS, mesg from AIS_PROCESS_LOG {{WHERE $rising_column$ > ?}} AND PROCESS_ID > 100000
    table = AIS_PROCESS_LOG
    tail.rising.column = PROCESS_ID

Dan · ‎04-19-2013

Perhaps your rising column does not increment in an orderly way. Looks like in the table there is a row with process_id 99999. Do the most recent entries have process_id higher than that, or lower than that?

You might consider using end_dt as the rising column

ezempel · ‎04-22-2013

I think I'm narrowing in on an answer. I'm having difficulty using START_DT or END_DT as a rising column. These are Oracle date fields, which need to be converted to a string in order to get the time information out (otherwise it is displayed as DD-MON-YY - see http://infolab.stanford.edu/~ullman/fcdb/oracle/or-time.html) This causes splunk to loop and keep pulling in the same data over and over again for the most recent day.

To get around this, I do a to_char conversion and display as column "starttime". If I use this, I get a one time dump but subsequent checks give ORA-904 error

Dan · ‎04-22-2013

What is the data type of PROCESS_ID in the database? You can see this from Splunk by using dbinfo

ezempel · ‎04-22-2013

Thanks for the suggestion, but unfortunately no duplicates 😞

Dan · ‎04-22-2013

Lets check one more thing. Sometimes changes in the manager UI create duplicate configuration files in different app contexts, and these configs can conflict. The fastest way to check this is to use the command line: $SPLUNK_HOME/bin/splunk cmd btool --debug-print=app inputs list | grep dbmon-tail
Output should look like:



dbx        [dbmon-tail://cmdb/cmdb_audit_log]

search     [dbmon-tail://cmdb/cmdb_audit_log]

Check the output for duplicate entries like in the above.

ezempel · ‎04-22-2013

I've done some more playing around with this today. I believe this is a bug at this point.

I do have process_ids higher than 99999. I manually edited state.xml to set latest.process_id = 103677 Then I watched the dbx_debug log.

It starts out with:
2013-04-22 12:02:28.378 dbx5947:INFO:TailDatabaseMonitor - Applying latest tail.rising.column value=103677

Then processes, but stops at 99999 again. After processing, looking at state.xml shows 99999 again. So, is there a limit of 100k rows or is this a bug?

ezempel · ‎04-19-2013

Interesting... there are more 100k records in this table - somewhere around 103,000

ezempel · ‎04-19-2013

Ok here is the state.xml file associated with the input (rather than the global one I posted earlier):

99999

bigtyma · ‎04-19-2013

You might consider making note of the settings and deleting the monitor and re-creating it, I have had to do that in some instances.

ezempel · ‎04-19-2013

Yeah, I've done that a few times already. This is the 3rd or 4th attempt. It pulls in all the data up the point in time where I create the new input, but doesn't update it after that. It behaves more like a one-time dump than a tail.

ezempel · ‎04-19-2013

Just to provide a little contextual information here - the table I'm pulling from uses an Oracle DATE field for START_DT and END_DT. If I pull that into splunk as is, I only get an epoch time value for the date but no time information (in other words, all events show the correct date but display a time of 12:00 am). By converting it to a character field I can get the full date, hours, minutes, and seconds. Not sure why that is, seems to be a function specific to oracle.

ezempel · ‎04-19-2013

state.xml:

1366374395757

ezempel · ‎04-19-2013

[dbmon-tail://Scap Monitor/Scap Tail monitor with convert]
host = host-113.subnet-110.med.umich.edu
index = default
interval = auto
output.format = kv
output.timestamp = 0
output.timestamp.column = START_DT
output.timestamp.format = Dy DD-Mon-YYYY HH24:MI:SS
query = select process_ID, process_name, to_char(start_dt, 'Dy DD-Mon-YYYY HH24:MI:SS') AS starttime, to_char(end_dt, 'Dy DD-Mon-YYYY HH24:MI:SS') AS endtime, STATUS, mesg from AIS_PROCESS_LOG {{WHERE $rising_column$ > ?}}
table = AIS_PROCESS_LOG
tail.rising.column = PROCESS_ID

Dan · ‎04-19-2013

Can you post your dbmon-tail stanza and your state.xml file, which is in $SPLUNK_DB/persistentstorage/dbx. Each input has it's own directory, which is a hash of it's name (ie. a 32 character long hex string). This directory typically contains 2 files:

manifest.properties: contains meta-information, such as the name of the input
state.xml: contains the actual state in XML format

This state file looks something like this:





  

    2012-12-07 04:22:25.703

Splunk DB Connect - Tail input not updating

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

New Dates, New City: Save the Date for .conf25!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud