Has anyone tried to dump logs that are in database live to file to import to splunk? Is this a good way to handle sucking up events that are in the database?
I ended up writing scripts that would dump the table every minute to txt file using ODBC. I used linux freetds stuff. I could't get the DB polling work with the current per.
Is there a better way to suck database inserted data to splunk?
I use scripted inputs that do database polling. I have a simple state file that I store the last accessed rowid (or some unique and incrementing value). Each time the script runs, it first loads that
rowid (aka. the the last known position) which is then fed into the
WHERE clause so that only newer (previously unseen) events are loaded. As the rows are pulled from the database, I write out (stdout) the event in a textual format which splunk indexes. Sometimes I'll tweak values or do a secondary lookup in the database to pull associated info into the log event text (for example, to resolve some id into a human-readable form). After all the records have been read, I update the state file with the last read
rowid value. Next time the script runs the whole process starts over again.
I wrote a little python module to consolidate some common components of this process, since I have 5 or so different tables in various databases that I pull from within our company.
Keep in mind that I am only talking about situations where an application is essentially use a database table rather than writing out log file. If you table contains data that is changing or being updated after it's initially written, than you have to use a different approach. From splunk's perspective it's all the same thing. The different would be in how you write your script to pull the events and how to convert them into a textual format for splunk to index.
I've also written small scripts that pull id/name type data and store them in a CSV file to use with the
lookup feature. This is another type of simple database integration that you can do with splunk. You have lots of open-ended integration options. 😉
Hi @Lowell - any chance you'd be post a generalized version of your your scripted-input solution on Splunkbase.com? I suspect others could benefit from the work you've done on this.
I started working on this, but unfortunately there's some embedded sensitive stuff that needs to be weeded out, which could take some time. My goal is to release both the python module as it's own project, as well as a make a packaged splunk app. This way, anyone want to build upon what I've done (or fix bugs) has a channel to do so, and for anyone wanting to just jump in, there is a quick way means of doing that by grabbing the app. I think such an approach would be most beneficial for all parties involved. (Not sure if any other apps take that approach, but I'm willing to give it a try.)
Thanks for the Tip. Let me ask you a few more questions.
Lunix / Unix splunk Host connecting to MS SQL : What unix DB interface did you use? For me, I have spend quote sometime getting Freetds to connect to MS Sql. I initially wanted to use Perl to poll data, but I could get the Perl to work with FreeTDS ODBC driver. So, I ended up writing a Python script to connect and poll data. Python seem to provide quite universal DB interfaces when the right drivers are installed on the server. This whole process was very tidious and difficult in the sense that it requires quote amount of twiaks for each different DB and tables. Didn't you have a this sort of similar issue?
Also the dbpolling PERL script from Splunk download site, Does it work? Did you have a chance to try it out? It would be great if there a generic database polling scripts that could support different types of DB and tables.
Instead of scripting, I thought about a DB export tool to dump to files perdiotically. What do you think about this idea?
(1) I'm pulling data from both MSSQL and Oracle databases at the moment using python. (2) I know I looked at this script but never actually used it. I got the gist of how it works and then built my own script in python (which I prefer). It's really quite simple to whip up a script to pull a few rows, reformat them in a text format, and write them out to stdout. (3) you may have difficulty with this. How do you plan on keeping track of what which rows were already exported? If your tool has an incremental mode, it might work. Also, is the export format human-readable or too verbose?
Yes, it is helpful - and I'm pulling it into the documentation. (http://www.splunk.com/base/Documentation/4.1/AppManagement/DataSources#Example_of_tailing_database_i...). Do you have a version of your script you'd be willing to post?
esachs, are you asking me if I'm willing to post a sample script? The db-loading python module I came up with isn't as small as I was remembering. It's been a while, so I took a look again and it contains 3,000 lines of code. It has consistent state-file handling logic, a command line interface for handling setup, it uses INI config files for storing connection parameters, and SqlAlchemy for some database abstraction... (so it's not something I can just post here). BTW, all this does make keep the individual db-to-splunk script very short and cookie-cutter... Is this still of interests?
Here is a link to a perl-based scripted input to pull data from db tables via dbi. It contains switching logic inside to poll the database(s) you mention:
It says it's for Splunk v3.4, but it's still just a scripted input, so it should still work.