Couple of months back we deployed a new splunk server 4.2.3 on RHEL 5 server and our old splunk version is 4.0.8 which is also running on RHEL5. In our older splunk(still running) currently we have around 18Billion events which is consuming around 700G of disk space in total. While as our new splunk server 4.2.3 have 20Billion events but its consuming around 2TB of disk space. I have heard that the data will be compressed by default, but not sure how why there is a huge disk space difference between the 2 versions. FYI, we have around 350 Universal forwarder sending data to this 1 indexer. Do you guys know what can be checked? will migrating from universal forwarder to heavy forwarder makes difference? and also is there a way we can compress the indexed data?
I don't know why you are seeing such a large difference, but I suspect one of the reported numbers is just wrong, as the space in both versions should be comparable (e.g., maybe you didn't actually have 18 billion events before, or if they were, they were in frozen state and unsearchable. Without knowing your data looks like, I would say that 18 billion events in searchable form into 700GB seems a bit light to me, but 20 billion in 2 TB seems more reasonable).
The data is already about as compressed as it can reasonably be, if it is to be searchable. When rolled to frozen storage (and unsearchable and unviewable from within Splunk without thawing back out) much of the data can be deleted and considerable additional space claimed.