Hello,
We have an installation of Splunk with a third party Splunk app which reads W3C log files. This is the third time I've installed this same set up on a different server. The first two installations are working fine, but the third installation has ballooned up in size to 200 GB (The other installations take up significantly less space). Most of the space is taken up by colddb in _internaldb.
Does anyone know why this could happen? I've only uploaded ~50 MB of log files. What could cause the internal index to increase in size by that much?
Note: The only difference with the third installation was that I changed the system time of the server to a September 5th, and then changed it back to current time after a few minutes. I did run a summary script to populate summary indexes when the system time was changed to September 5th.
The _internal index is the index of Splunk's own log data. The simplest answer is to change the retention (max data size) of the _internal database. Out of the box, it defaults to 500GB.
Again, the _internal index is where Splunk logs its own operations. This includes visits to the Splunk UI, reports of its own activities, error logs if any, etc.
I appreciate that "it should look like the others", but if you're viewing the data in any different way, or if there is an error specific to that host, or a myriad of other possible causes, the logs for that host would be bigger than the other installs.
You can get more of a hint by checking the $SPLUNK_HOME/var/log/splunk directory (this is what goes to _internal) for large files.
What is a good size to set this to?
You're right that we have to set up the maximum retention size, it should be the first step. I am concerned about new data being added where there should be none. By setting the retention size and not addressing the issue of unnecessary data, we might end up archiving the older, useful data and replacing it with newer corrupted data.