The chances of the same question being posted by multiple people seems pretty unlikely and given how often this happens - why post under many usernames?
Anyway, to the problem at hand. Did you try the documentation? A quick search reveals;
http://docs.splunk.com/Documentation/Splunk/5.0.1/Indexer/HowSplunkstoresindexes
This is very comprehensive and theres little point in me summarising it here, if you read it then you'll have a complete understanding of how Splunk stores and where it stores this data.
The rawdata is needed to rebuild the metadata should the buckets ever become corrupted or unable to be read by Splunk, this is also important in a clustered environment where you can choose how many copies of the raw data are available for recovery purposes.
EDIT: Oh and a final consideration, if you are indexing events from local log files then you also have to consider that the original data will also remain - depending on the retention/rolling policies already in place
"The chances of the same question being posted by multiple people seems pretty unlikely and given how often this happens - why post under many usernames?"
Maybe the person thought they stood a better chance of receive support by posting it more than once?.......? I don't know, it's a thought.
So, I read the link you gave and I'm sorry...I'm still overwhelmed with trying to find my data and where it's located. I hope this user was helped, as for me...I'm still wondering where the data went. Thanks anyway.
So that data is just stored in text files directly on the Splunk servers?
gurinderbhatti: I suggest you post that as a new question, with some additional detail about your deployment and usage of DB Connect.
Ironically, this post relates to my question. I have a intermediate server with a heavy forwarder installed. I am using db connect app on the intermediate server to get mssql db logs. I want to forward them to an indexer. What path in the inputs.conf file should i monitor on the heavy forwarder? "/var/splunkhot/splunk/var/lib/splunk"?
Data is stored in $SPLUNK_HOME/var/lib/splunk
, one directory per index ($SPLUNK_HOME
being where Splunk was installed). The files in the respective directories hold the data in the indexes. The data in these files is not meant to be read directly - it would be very much like trying to read MySQL's database files directly expecting to be able to make sense of them.
EDIT: Upon reading the link, this is already explained there. Where is it you get confused? What are you trying to do and why?
The chances of the same question being posted by multiple people seems pretty unlikely and given how often this happens - why post under many usernames?
Anyway, to the problem at hand. Did you try the documentation? A quick search reveals;
http://docs.splunk.com/Documentation/Splunk/5.0.1/Indexer/HowSplunkstoresindexes
This is very comprehensive and theres little point in me summarising it here, if you read it then you'll have a complete understanding of how Splunk stores and where it stores this data.
The rawdata is needed to rebuild the metadata should the buckets ever become corrupted or unable to be read by Splunk, this is also important in a clustered environment where you can choose how many copies of the raw data are available for recovery purposes.
EDIT: Oh and a final consideration, if you are indexing events from local log files then you also have to consider that the original data will also remain - depending on the retention/rolling policies already in place
Can we get this updated? I cannot find anywhere in the current Splunk docs that say where the files are