We need to export from SPLUNK and load into ORACLE large amount of data (IIS logs ).
About 30 mil records per day, or about 200k records every 15 mins.
We tried Python SDK to create csv flat files and dump it into Oracle using Informatica.
But that is taking a long time, so we can't meet requirements.
Now we are thinking about Splunk DB Connect tool instead.
Could anyone advise if this is a right way of doing that?
Any help is greatly appreciated.
Thanks a lot, Leo.
Splunk is not a database load tool, especially for the volume of records you are talking about. Even if you could insert that many rows, your performance would likely be terrible, as Splunk is not built for that kind of bulk loading. Oracle provides a number of native methods for loading data from flat files that would work much, much better, like external tables (if your database is on the same server) or SQL*Loader (if loading over a network). These tools are designed to do what you are talking about, where Splunk is not.
Hi guys and thank you for all of your replies!
We were given a task to use Splunk as the only source. And of course we would use sql loader or other tools to load data directly from files to Oracle. But the question now is how to get or forward those files, which are splunk source files? Are those files in Splunk are reachable at all? Can we use Splunk forwarder to automatically forward source files from Splunk server at some stage over to let's say Informatica server?
Again thanks a lot for your help!
I can see how it would be convenient to use data already centralized in Splunk to load directly into the database, however. It would be equally problematic to have to figure out a whole new centralization architecture for the database load... Performance of DBX in this role would depend a lot on the hardware and OS involved for both the indexer(s) and the database, so it's hard to say for sure what the best solution for @leo_y would be. I would still recommend using SQL-Loader in some way if there's an efficient way to feed it data from within Splunk.
I was actually thinking that SQL-Loader could be used to import the log files directly to Oracle and not use Splunk to load data at all. I thought I saw something in the DBX docs about a 50,000 row limit in load transactions; given the volume he's talking about, that seemed to be a likely show stopper. Splunk would have to be grinding constantly to push new rows in to have any hope of keeping up, and the number of commits would bog down the entire database. SQL-Loader could bulk load hundreds of thousands of rows in a direct-path write that would blow the doors off of anything Splunk could do.
Splunk can certainly bulk load, and in fact DB Connect is designed for that, but similar to the tone of my earlier answer where I prefaced that using Splunk isn't the best solution to this problem... I agree that using Splunk isn't usually the best solution to this problem. The solutions you proposed are just as good if not better. However, the fact that @leo_y already has the data consolidated in Splunk makes it more compelling to use a database output from Splunk. If he has no other way to get the data, he'd need to export from Splunk to use your methods, which is suboptimal.
In the case you describe above, you can set up a database output against the index in Splunk that contains the IIS logs. You would define a search to get the IIS logs, then some SQL to insert the data into your database. Unfortunately, the documentation on database outputs are a bit spotty at the moment, but there are some examples here:
Since this instance will be doing a lot of work, I would recommend setting it up on a stand-alone Splunk instance rather than on your search head or on an indexer.
Thanks for your quick reply, but I'm not sure if I set it right.
IIS logs are already in the SPLUNK system and we need to "export" it from SPLUNK and load into Oracle.
Could we use DB Connect this way?
It says "import" in the docs:
use Splunk DB Connect to import and index the data already stored in your database