Splunk uses a proprietary data store called an index which consists of raw files. It is nothing like a conventional DB. Here is a good explanation of what an index is and how Splunk stores data:
http://docs.splunk.com/Documentation/Splunk/6.6.0/Indexer/Howindexingworks
Here is a good explanation of how data is stored in the index:
http://docs.splunk.com/Documentation/Splunk/6.6.0/Indexer/HowSplunkstoresindexes
MongoDB is used by Splunk to facilitate certain internal functionality like the kvstore but is by no means where data is stored as it is ingested from Universal Forwarders etc. Data that is ingested from external sources all goes to an index as specified in your configuration.
@tnesavich_splun - just FYI - the links no longer work
Thank you, that worked. https://docs.splunk.com/Documentation/Splunk/9.0.1/Indexer/HowSplunkstoresindexes
Splunk might be using mongDB database, not sure even i want confirmation.
Why mongoDB: Coz i have seen the process named mongoDB running when indexer starts or restart.
Also source = C:\Program Files\Splunk\var\log\splunk\mongod.log with index=_internal*
I believe the mongoDB is part of the KV Store not indexing.
@stath002 is correct. It's for storing state or whatever you'd like but absolutely it's not the data storage system.
Learn more: About the app key value store
I think Splunk might be using Lucene as a backend seach engine, though I am not sure, and looking for a confirmation.
No. Splunk uses its own proprietary storage/db.
Yes, it might use its own proprietary storage, but what about the search engine? Lucene sounds like a good possibility.
is it mean, Splunk develop its own system to do this from ZERO? And it is really does not have any kind direct/significant relation to other DB technology?
Yes, Splunk developed their own on-disk storage format from "zero". (If you call having a C++ compiler and standard libraries "zero") From an architecture perspective, there are large differences between an ACID-capable generalized RDBMS and (essentially) a search engine's data storage. Splunk does not have (and does not need) many of the features a relational DB has. Also, most relational DB's full-text search are ugly side-additions. The Splunk developers were able to make an on-disk data format that meets their needs exactly.
Hi there,
Splunk does not use arealtional database to store events and indexes.
The storage is all flat file based.
Please have a look here:
http://www.splunk.com/base/Documentation/4.1.4/Admin/WhatsaSplunkindex
Hope that answers your question?
Cheers,
simuvid
Adding updated link:
http://docs.splunk.com/Documentation/Splunk/6.5.1/Indexer/Howindexingworks