Getting Data In

What is the most effective ways to poll databases such as MS SQL, Sybase, Oracle?

clyde772
Communicator

Has anyone tried to dump logs that are in database live to file to import to splunk? Is this a good way to handle sucking up events that are in the database?

I ended up writing scripts that would dump the table every minute to txt file using ODBC. I used linux freetds stuff. I could't get the DB polling work with the current per.

Is there a better way to suck database inserted data to splunk?

Tags (2)

clyde772
Communicator

To cleanly solve this issue, since splunk natively don't support DB interface in great range, I have developed a java application to poll any types of database.

  1. Oracle
  2. MS-SQL (sql server)
  3. MySQL

I figure it would be best to use java since it provides a good proven DB interface support, with out using messy stuff like Free TDS, which don't support 2 byte characters. We are investing a little more to make this database polling utility to be more flexible and powerful.

Let me know, if there are any others who may be interested in investing a little more fund into getting this done. Currently we got it ready where I can pretty much poll any database, any tables and fields.

Young.

0 Karma

j666gak
Communicator

Hi clyde772,

How did you get on with this, is it released?

0 Karma

kkalmbach
Path Finder

Lowell, Thanks for the info. I know it's been awhile, but any progress on making this an app (or cleaning up the code so you could release it)?

Thanks -Kevin

0 Karma

Lowell
Super Champion

It's probably never going to be released. Sorry. As much as I'm not a fan of Java, it may be the better way to go here.

0 Karma

southeringtonp
Motivator

Another option is to use a third-party product to poll the database for you and convert it to a flat file or syslog output. Adiscon makes one, and there are almost certainly others out there.

Buying additional software is far from ideal, but may be preferable for people without a strong scripting background.

0 Karma

clyde772
Communicator

Lowell,

Thanks for your help and response. It's been a while, and we are trying to get this resolved. We have concluded the following.

Since the UNIX database drivers sucks, like odbc drivers, freetds etc.

We are now deciced to develop DB polling in Java program, since java provide universal DB drivers and connectors to Databases. and I think this would be simpler solution to all the others who are trying to get this done.

YhC.

0 Karma

maverick
Splunk Employee
Splunk Employee

Here is a link to a perl-based scripted input to pull data from db tables via dbi. It contains switching logic inside to poll the database(s) you mention:

http://www.splunkbase.com/apps/All/3.x/app:Script+for+database+inputs#

It says it's for Splunk v3.4, but it's still just a scripted input, so it should still work.

esachs
Splunk Employee
Splunk Employee

Yes, it is helpful - and I'm pulling it into the documentation. (http://www.splunk.com/base/Documentation/4.1/AppManagement/DataSources#Example_of_tailing_database_i...). Do you have a version of your script you'd be willing to post?

Lowell
Super Champion

esachs, are you asking me if I'm willing to post a sample script? The db-loading python module I came up with isn't as small as I was remembering. It's been a while, so I took a look again and it contains 3,000 lines of code. It has consistent state-file handling logic, a command line interface for handling setup, it uses INI config files for storing connection parameters, and SqlAlchemy for some database abstraction... (so it's not something I can just post here). BTW, all this does make keep the individual db-to-splunk script very short and cookie-cutter... Is this still of interests?

clyde772
Communicator

Lowell,

Thanks for the Tip. Let me ask you a few more questions.

  1. Lunix / Unix splunk Host connecting to MS SQL : What unix DB interface did you use? For me, I have spend quote sometime getting Freetds to connect to MS Sql. I initially wanted to use Perl to poll data, but I could get the Perl to work with FreeTDS ODBC driver. So, I ended up writing a Python script to connect and poll data. Python seem to provide quite universal DB interfaces when the right drivers are installed on the server. This whole process was very tidious and difficult in the sense that it requires quote amount of twiaks for each different DB and tables. Didn't you have a this sort of similar issue?

  2. Also the dbpolling PERL script from Splunk download site, Does it work? Did you have a chance to try it out? It would be great if there a generic database polling scripts that could support different types of DB and tables.

  3. Instead of scripting, I thought about a DB export tool to dump to files perdiotically. What do you think about this idea?

YhC.

0 Karma

Lowell
Super Champion

(1) I'm pulling data from both MSSQL and Oracle databases at the moment using python. (2) I know I looked at this script but never actually used it. I got the gist of how it works and then built my own script in python (which I prefer). It's really quite simple to whip up a script to pull a few rows, reformat them in a text format, and write them out to stdout. (3) you may have difficulty with this. How do you plan on keeping track of what which rows were already exported? If your tool has an incremental mode, it might work. Also, is the export format human-readable or too verbose?

Lowell
Super Champion

I use scripted inputs that do database polling. I have a simple state file that I store the last accessed rowid (or some unique and incrementing value). Each time the script runs, it first loads that rowid (aka. the the last known position) which is then fed into the WHERE clause so that only newer (previously unseen) events are loaded. As the rows are pulled from the database, I write out (stdout) the event in a textual format which splunk indexes. Sometimes I'll tweak values or do a secondary lookup in the database to pull associated info into the log event text (for example, to resolve some id into a human-readable form). After all the records have been read, I update the state file with the last read rowid value. Next time the script runs the whole process starts over again.

I wrote a little python module to consolidate some common components of this process, since I have 5 or so different tables in various databases that I pull from within our company.

Keep in mind that I am only talking about situations where an application is essentially use a database table rather than writing out log file. If you table contains data that is changing or being updated after it's initially written, than you have to use a different approach. From splunk's perspective it's all the same thing. The different would be in how you write your script to pull the events and how to convert them into a textual format for splunk to index.

I've also written small scripts that pull id/name type data and store them in a CSV file to use with the lookup feature. This is another type of simple database integration that you can do with splunk. You have lots of open-ended integration options. 😉

Lowell
Super Champion

I started working on this, but unfortunately there's some embedded sensitive stuff that needs to be weeded out, which could take some time. My goal is to release both the python module as it's own project, as well as a make a packaged splunk app. This way, anyone want to build upon what I've done (or fix bugs) has a channel to do so, and for anyone wanting to just jump in, there is a quick way means of doing that by grabbing the app. I think such an approach would be most beneficial for all parties involved. (Not sure if any other apps take that approach, but I'm willing to give it a try.)

0 Karma

Justin_Grant
Contributor

Hi @Lowell - any chance you'd be post a generalized version of your your scripted-input solution on Splunkbase.com? I suspect others could benefit from the work you've done on this.

0 Karma

Justin_Grant
Contributor

@Lowell - great answer!

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...