All Apps and Add-ons

Can Splunk index .gzip'd XMLs from a SQL database?

nick405060
Motivator

We have a SQL table where a binary_log_content field is a .gzip'd XML for the logs we'd like to ingest.

How can I ingest these logs, short of using Python to query and save to a disk location that Splunk then monitors and auto-extracts?

0 Karma
1 Solution

nick405060
Motivator

How can I ingest these logs, short of using Python to query and save to a disk location that Splunk then monitors and auto-extracts?

Okay well ignoring the "short of" part... this is how you do this. Despite being in the db_connect directory, this does not use db_connect. You can move things to system/local or another directory that you think is more correct, if desired.

crontab -e

#!/usr/bin/env python3
* * * * * python3 /opt/splunk/etc/apps/splunk_app_db_connect/HOSTNAME_query.py

/opt/splunk/etc/apps/splunk_app_db_connect/HOSTNAME_query.py

import pymssql

conn = pymssql.connect(server='HOSTNAME:PORT',user='USERNAME',password='PASSWORD',database = 'DATABASE')
cursor = conn.cursor()
cursor.execute('SELECT MYFIELD,log_date FROM DATABASE.dbo.log_history WHERE log_date > DATEADD(day, -1, GETDATE())')

row = cursor.fetchone()
count = 0
while row:
    count = count+1
    row = cursor.fetchone()
    date=str(row[1])[:str(row[1]).index(' ')]
    open('/opt/splunk/var/run/splunk/xmlballs/'+date+'.gz','wb').write(row[0])

/opt/splunk/etc/apps/splunk_app_db_connect/local/inputs.conf

[monitor:///opt/splunk/var/run/splunk/xmlballs]
disabled = 0
sourcetype=compressed_xml
index=MYINDEX

View solution in original post

0 Karma

nick405060
Motivator

How can I ingest these logs, short of using Python to query and save to a disk location that Splunk then monitors and auto-extracts?

Okay well ignoring the "short of" part... this is how you do this. Despite being in the db_connect directory, this does not use db_connect. You can move things to system/local or another directory that you think is more correct, if desired.

crontab -e

#!/usr/bin/env python3
* * * * * python3 /opt/splunk/etc/apps/splunk_app_db_connect/HOSTNAME_query.py

/opt/splunk/etc/apps/splunk_app_db_connect/HOSTNAME_query.py

import pymssql

conn = pymssql.connect(server='HOSTNAME:PORT',user='USERNAME',password='PASSWORD',database = 'DATABASE')
cursor = conn.cursor()
cursor.execute('SELECT MYFIELD,log_date FROM DATABASE.dbo.log_history WHERE log_date > DATEADD(day, -1, GETDATE())')

row = cursor.fetchone()
count = 0
while row:
    count = count+1
    row = cursor.fetchone()
    date=str(row[1])[:str(row[1]).index(' ')]
    open('/opt/splunk/var/run/splunk/xmlballs/'+date+'.gz','wb').write(row[0])

/opt/splunk/etc/apps/splunk_app_db_connect/local/inputs.conf

[monitor:///opt/splunk/var/run/splunk/xmlballs]
disabled = 0
sourcetype=compressed_xml
index=MYINDEX
0 Karma

nick405060
Motivator

You definitely cannot do it using dbconnect. If you do, you get the non-displayable column type image message:

https://answers.splunk.com/answers/587501/splunk-db-connect-checkpoint-set-to-null.html

0 Karma

woodcock
Esteemed Legend

Splunk understands how to eat gzipped files so you do not have to do anything. Because you already have this data in a database, there is no reason to archive the extracts so I suggest something like this so that Splunk deletes the files after it forwards them:

[batch:///path/to/your/files/*.xml.gz]
move_policy = sinkhole
sourcetype=foo
index=bar
host_segment=bat

nick405060
Motivator

I don't think this actually answers my question @woodcock

0 Karma

woodcock
Esteemed Legend

You are correct, it was not immediately clear that it is a field in the CSV that is gzipped. #cribl can do this.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...