We have bro TA installed and putting all the bro logs into a dedicated index. We are logging ~5GB per day. The index size on disk is about 2.5 times larger than the raw data size according to Fire Brigade app. This is supported by inspection of index folder size. For comparison we are logging a similar daily volume of Windows security events and the on-disk size is <30% of the raw data size.
Why does our bro data not compress but inflate?
The answer most likely has to do with the fact that the Bro Add-On uses the INDEXED_EXTRACTONS setting to index the events as events with headers.  What this does is index each of those fields and thus the index is HUGE.
I recently discovered that Bro can log in JSON format and am working to port the Add-On over to use KV_MODE = JSON instead of INDEXED_EXTRACTIONS.  The difficulty is that I'll have to use a different index to store the new format in and potentially use a different sourcetype bro_json_http versus bro_http since the field extractions are based on props.conf settings.
Hope this helps.
I actually noticed exactly the same thing, in my case it's more like 3x
|dbinspect index=bro |eval rawMB=(rawSize / 1024 / 1024 ) | stats values(index), sum(rawMB) AS rawTotal, sum(sizeOnDiskMB) AS diskTotalinMB
values(index)   rawTotal    diskTotalinMB
bro 36052.553794    104741.765628
I just created a test index called bro_test and loaded a 147MB bro_conn file with 1055473 lines to it and I see the same results in splunk. Fresh data only, nothing else.
rawsize=145MB
diskSize=257MB
[user@host bro1]$ wc -l conn.11:00:00-12:00:00.log
1055473 conn.11:00:00-12:00:00.log
[user@host bro1]$ du -hs conn.11:00:00-12:00:00.log 
147M    conn.11:00:00-12:00:00.log
|dbinspect index=bro_test |eval rawMB=(rawSize / 1024 / 1024 ) | stats values(index), sum(rawMB) AS rawTotal, sum(sizeOnDiskMB) AS diskTotalinMB, values(eventCount)
values(index)   rawTotal    diskTotalinMB   values(eventCount)
bro_test    145.171837  257.875000  1055473
Sample of log file:
1421668703.224169   Ci2Oik4vP9Wc8n6XDk  10.10.101.238   54350   23.63.99.88 80  tcp http    30.904114   678 171648  SF  T   1   ShADadfF    76  4642    131 178560  (empty) -   US  so-eth3
1421668703.074892   CBIaMfbZYpIOO77x1   10.10.101.238   54347   23.61.254.251   80  tcp http    31.053642   340 84757   SF  T   0   ShADadFf    43  2588    69  88353   (empty) -   US  so-eth3
1421668693.147515   Cx3H5H1v12kDTYuJYa  10.10.101.238   54340   23.61.254.58    80  tcp http    40.981791   1017    308532  SF  T   0   ShADadFf    135 8049    231 320552  (empty) -   US  so-eth3
1421668703.002721   CZqQ5646gAQBpjK3Ma  10.10.101.238   54346   23.61.254.200   80  tcp http    31.126685   338 106877  SF  T   0   ShADadFf    47  2794    82  111149  (empty) -   US  so-eth3
1421668693.313653   C7ZQp24SOGM1hoW2Yl  10.10.101.238   54343   23.61.254.16    443 tcp ssl 40.815798   1606    18566   SF  T   0   hSADadFfR   27  2998    22  19718   (empty) -   US  so-eth3
 
		
		
		
		
		
	
			
		
		
			
					
		I'm suspecting that this isn't actually the case. Your raw data is almost definately smaller. However - you may be subject to datamodels, search accelerations, etc that will increase the size of the index on disk due to tsidx files used in Accelerations.
Do you have any searches accelerated?
You are right. the raw data are compressed. I think about 4:1. I tried to check this across all the compressed raw data files but it seems like splunk uses a gzip format that has file size int overflow issues (known gzip bug) so compression ratio came out negative for a lot of files.
I still don't know why the indexes are so large. I am pretty sure it is not accelerated searches though. I will keep looking but suspect it may be part of TA-bro that I don't want to break.
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		what else do you have installed on the system? What other apps, addons?
