The reason we installed DBConnect 2 is to export data out of Splunk to Oracle. We followed the steps outlined in the
Splunk documentation and created an output to Oracle Database. The output is attempting to insert data to a table with an Oracle DATE column and 5 other VARCHAR columns. After schdueling the output and looking at the healt tabe I noticed that the output operation is failing every time with the following error: MESSAGE="ORA-01861: literal does not match format string
2015-07-29 12:15:02 INFO HealthLogger: ... FUNCTION=RollbackTransaction LABEL=JP MAX_MEMORY=1037959168 MESSAGE="ORA-01861: literal does not match format string" PROTOCOL=HTTP SQL="INSERT INTO \"PROD\".\"SUBSCRIPTIONS\"
(START_TIME,USERNAME,ACCT_SESSION_ID,NAS_IP,NAS_IDENTIFIER,SUBSCRIPTION_NAM
E) VALUES (?,?,?,?,?,?)" SQL_CODE=1861 SQL_STATE=22008 STATE=completed
After some research I realized this error has to do with the Oracle Date Column. I am formatting the first column as a
Date Time field in my Splunk query using strftime(_time,"%F %T") format. Here is the Splunk Query:
index=main sourcetype=exportRecord | eval dateTime=strftime(_time,"%F %T"))
However Oracle by default does not like the date/time format and it needs the format to be specified using TO_DATE Oracle function. Db Connect is inserting data into the table using parametrized SQL which I cannot change to add TO_DATE function. I cannot find any documentation on how to control this from Splunk. Any help is greatly appreciated.
Second question is with regards to the frequency and control of the data that is exported. Assuming this output operation is fixed so that it works I could not find any documentation on how Splunk keeps track of what data it has exported across runs and does not re-export data again. For example if I have this output job run once every hour how does Splunk Db Connect only export the data since its last successful invocation and the current time where the DB Output job is running. Again referring to Splunk documentation I found the following section:
Finalize your output
In the last step, finalize the output by entering the following settings, and then click Save:
Execution Frequency: Enter the number of seconds (or a valid cron expression) between output executions. For example, entering 120 instructs DB Connect to wait two minutes after it's done sending data to your database before doing it again. Be aware that only new events will be sent at each execution.
I am trying to see if I have to do anything special to make sure Splunk only outputs new events at each execution per statement above.
After many hours of debugging and going back and forth with Splunk Support here is what I did.
First Question: Controlling data/time format of the data entering into a database as part of DB Output Operation?
Short answer is this cannot be controlled from Splunk unless you follow suggestions from other users to this question and modify inputs.conf directly and not go thru the DB Output UI to update/modify your output. If you do not have access to input.conf then you need to change your column in Oracle to be of type "Timestamp with TImeZone" and update the splunk query to output the date time in the exact format Oracle wants which would depend on your installation. To find out the default Oracle format run the following SQL in Oracle:
select * from nls_database_parameters where parameter='NLS_TIMESTAMP_TZ_FORMAT'
The SQL above gave me the default date time format for my version or Oracle as "DD-MON-RR HH.MI.SSXFF AM TZR" - I ten used the Splunk function strftime(_time,"%d-%b-%y %I.%M.%S.%3N %p %z") to output the date time in the exact format Oracle is expecting.
Second Question: You are tasked with exporting specific data out of Splunk to a database on regular basis and you have to make sure you do not export duplicates or miss exporting logs because Splunk was down for maintenance or the DB Output job did not run at the scheduled time for any reason. How can you ensure the next time DB Output operation runs it will output all records since the last export to the database?
I looked around and the only solution I found that I thought would be reliable is to query the database and find out what was the date/time stamp of the last exported record and export all records with a date time stamp after that to the database. After some research I came up with the following Splunk Query to do this:
index=main sourcetype=radius
[| dbxquery connection="bidb" maxrows=1 shortnames=true query="SELECT to_char(MAX(START_TIME), 'mm/dd/yyyy:HH24:MI:SS') AS EARLIESTTIME FROM SUBSCRIPTIONS" | eval earliest=if(EARLIESTTIME=="NULL","-3mon@mon",EARLIESTTIME) | eval latest="now" | fields earliest, latest | format "(" "(" "" ")" "OR" ")"]
Referring to the above Splunk query - the inner query run once and uses dbxquery to find the date time of the last records exported to database. Then it uses that to set the "earliest" value in the Splunk query along with "latest" set to "now". This way the search can be scheduled to run at whatever intervals you like since we no longer rely on specific duration or do we rely on the output job to run at an exact time every time.
This approach worked great in Splunk UI while I went thru the DB Output Wizard and setup the operation. But once scheduled it failed every time with an HTTP 400 error in the logs. After spending hours trying to debug and going back and forth with Splunk support it turns out you cannot have a dbxquery in your DB Output operation even thou it works when you are setting up the operation. During setup the query is run under the DB Connect app and it succeeds but scheduled jobs which are run thru the python script are not run under the DB Connect app context and are run under the default search and reporting app context which does not know what dbxquery is and as such it fails.
My conclusion is that DB Connect V2 Output Operation is NOT currently a reliable method of exporting data out of Splunk on regular basis. We decided to just write our own Java app that uses the Splunk Rest API to pull data out of Splunk and store in the database.
After many hours of debugging and going back and forth with Splunk Support here is what I did.
First Question: Controlling data/time format of the data entering into a database as part of DB Output Operation?
Short answer is this cannot be controlled from Splunk unless you follow suggestions from other users to this question and modify inputs.conf directly and not go thru the DB Output UI to update/modify your output. If you do not have access to input.conf then you need to change your column in Oracle to be of type "Timestamp with TImeZone" and update the splunk query to output the date time in the exact format Oracle wants which would depend on your installation. To find out the default Oracle format run the following SQL in Oracle:
select * from nls_database_parameters where parameter='NLS_TIMESTAMP_TZ_FORMAT'
The SQL above gave me the default date time format for my version or Oracle as "DD-MON-RR HH.MI.SSXFF AM TZR" - I ten used the Splunk function strftime(_time,"%d-%b-%y %I.%M.%S.%3N %p %z") to output the date time in the exact format Oracle is expecting.
Second Question: You are tasked with exporting specific data out of Splunk to a database on regular basis and you have to make sure you do not export duplicates or miss exporting logs because Splunk was down for maintenance or the DB Output job did not run at the scheduled time for any reason. How can you ensure the next time DB Output operation runs it will output all records since the last export to the database?
I looked around and the only solution I found that I thought would be reliable is to query the database and find out what was the date/time stamp of the last exported record and export all records with a date time stamp after that to the database. After some research I came up with the following Splunk Query to do this:
index=main sourcetype=radius
[| dbxquery connection="bidb" maxrows=1 shortnames=true query="SELECT to_char(MAX(START_TIME), 'mm/dd/yyyy:HH24:MI:SS') AS EARLIESTTIME FROM SUBSCRIPTIONS" | eval earliest=if(EARLIESTTIME=="NULL","-3mon@mon",EARLIESTTIME) | eval latest="now" | fields earliest, latest | format "(" "(" "" ")" "OR" ")"]
Referring to the above Splunk query - the inner query run once and uses dbxquery to find the date time of the last records exported to database. Then it uses that to set the "earliest" value in the Splunk query along with "latest" set to "now". This way the search can be scheduled to run at whatever intervals you like since we no longer rely on specific duration or do we rely on the output job to run at an exact time every time.
This approach worked great in Splunk UI while I went thru the DB Output Wizard and setup the operation. But once scheduled it failed every time with an HTTP 400 error in the logs. After spending hours trying to debug and going back and forth with Splunk support it turns out you cannot have a dbxquery in your DB Output operation even thou it works when you are setting up the operation. During setup the query is run under the DB Connect app and it succeeds but scheduled jobs which are run thru the python script are not run under the DB Connect app context and are run under the default search and reporting app context which does not know what dbxquery is and as such it fails.
My conclusion is that DB Connect V2 Output Operation is NOT currently a reliable method of exporting data out of Splunk on regular basis. We decided to just write our own Java app that uses the Splunk Rest API to pull data out of Splunk and store in the database.
I had the same issue. Setting the interval correctly is insufficient. There is too much room for error to either duplicate or miss a record. I needed a more solid solution. I changed the query in inputs.conf to call a function that will handle the duplicate checking. That way if there is overlap it can be handled by the function.
This is based on using DB Connect v2 with postgres but should work for mysql as well.
In input.conf, change from:
query = INSERT INTO <fully qualified table name>(_field1,_field2,_field3) VALUES (?,?,?)
to
query = update <fully qualified table name> set <_field1>=<_field1> where 1=(SELECT <fully qualified function name>(?,?,?))
Sadly, I found the update wrapper necessary because the db output routine expects to be calling INSERT and expects no result. If you call SELECT directly, an empty result set is returned by the function that cause db output to throw an error
I've read in other posts that using the DB Connect gui may overwrite the input.conf. I have not experienced this yet, but I would watch out for that.
Also, once the edit is made, restart DB Connect with:
sudo su - splunk
cd /opt/splunk/bin
./splunk _internal call /services/apps/local/splunk_app_db_connect/_reload -auth $ADMIN_LOGIN:$ADMIN_PWD
Another, perhaps cleaner, option is to block the duplicates with a rule(trigger) as described here: [http://dba.stackexchange.com/questions/30499/optimal-way-to-ignore-duplicate-inserts]
You can alter the parameterized SQL via the query parameter in inputs.conf:
[mi_output://test_largeOutput]
connection = mySQL
customized_mappings = source:source,sourcetype:sourcetype,host:host,index:indexer
interval = 2400
is_saved_search = 0
query = INSERT INTO `test`.`testlargeoutput` (source,sourcetype,host,indexer) VALUES (?,?,?,?)
search = index=_internal | head 50000 | table host source sourcetype index
time_out = 6000
transactional = 0
It is up to you to ensure that you do not output duplicate events. The easiest way to do so is to make sure that the delta between earliest and latest time of your search matches the interval you specify for the output.