I have two indexes that contain different sets of events.
Index 1
Event Count – 23,952
Current Size – 19
Index 2
Event Count – 431,026
Current Size – 20
The size is the same, but the number of events is drastically different. This would make sense except that the events in both indexes are generally the same length. Any explanation for the difference in size here?
Index 1 - Event Example
{"time":"Fri Apr 03 17:57:08 CDT 2015","web_request_response_time":"0.45356011390686035","application":"node_count":"1","DataType":"PurepathData","state":"OK","cpu":"0.448837012052536","System Profile":"c_prodissue","breakdown":"CPU: 0.449 ms, Sync: -, Wait: -, Suspension: -","agent":"_JavaApp06_sin@sin:1547","root_path_thread_name":"http-apr-169.97.17.67-11000-exec-2","time":"Fri Apr 03 17:57:08 CDT 2015","response_time":"0.45356011390686035","execsum":"0.45356011390686035","name":"/SUI/monitoring","exec":"0.45361328125"}
{"time":"Fri Apr 03 17:57:03 CDT 2015","web_request_response_time":"0.5128860473632812","application":"applic","node_count":"1","DataType":"PurepathData","state":"OK","cpu":"0.5083289742469788","System Profile":"_uat_prodissue","breakdown":"CPU: 0.508 ms, Sync: -, Wait: -, Suspension: -","agent":"UAT_JavaApp05_sin@sin:28893","root_path_thread_name":"http-apr-169.97.17.62-11000-exec-17","time":"Fri Apr 03 17:57:03 CDT 2015","response_time":"0.5128860473632812","execsum":"0.5128860473632812","name":"/UI/monitoring","exec":"0.512939453125"}
Index 2 - Event Example
System_Profile=Monitoring #document dynatrace version=6.1.0.8054 systemprofile capture=true modifiedby=E745984 repositoryaccess=true incidentrules incidentrule flags=1 id=Host Disk Unhealthy incidentdashboardname=Incident Zero Conf Dashboard timeframe=10 actions actionref bundleversion=0.0.0 execution=begin key=com.dynatrace.diagnostics.plugins.EmailNotification refaction=com.dynatrace.diagnostics.plugins.EmailNotification rolekey=com.dynatrace.diagnostics.plugins.EmailNotificationAction roletype=1 severity=informational smartalert=false type=Email Notification property key=from typeid=string value=
System_Profile=Monitoring #document dynatrace version=6.1.0.8054 systemprofile capture=true modifiedby=E745984 repositoryaccess=true incidentrules incidentrule flags=1 id=Host Network Unhealthy incidentdashboardname=Incident Zero Conf Dashboard timeframe=10 actions actionref bundleversion=0.0.0 execution=begin key=com.dynatrace.diagnostics.plugins.EmailNotification refaction=com.dynatrace.diagnostics.plugins.EmailNotification rolekey=com.dynatrace.diagnostics.plugins.EmailNotificationAction roletype=1 severity=informational smartalert=false type=Email Notification property key=bcc typeid=string value=
I'm going to guess that your data in index 1 has INDEXED_EXTRACTIONS=json
activated in props.conf. More space used in that case is expected behaviour, that space is traded for speed when using those fields - especially in tstats
situations.
To further investigate, run these two searches:
| dbinspect index=index1 | eval rawSizeMB = rawSize / 1048576 | table id eventCount rawSizeMB sizeOnDiskMB
| dbinspect index=index2 | eval rawSizeMB = rawSize / 1048576 | table id eventCount rawSizeMB sizeOnDiskMB
That'll give you the event count, the raw size ingested into each bucket for that index, and how much space each bucket occupies on disk. If you have single huge rogue events you should see one bucket behaving differently from the others, if my JSON guess is correct all buckets for an index should look fairly similar.
As for your events themselves, it seems the data in index 1 has more unique tokens - for example, those huge precision numbers. Lots of unique tokens will increase the size of dictionaries, and hence Splunk's index structures. The index 2 sample events seems to have lots of repeating tokens in the field values, not a lot of unique ones.
By default, Splunk will force an event break after 10000 characters. You can modify that in props.conf using the TRUNCATE
setting. In the same spirit, the default will break after 256 lines in one event, see MAX_EVENTS
in props.conf.
These default limits are there to mitigate either wrong configurations or systems throwing unexpected log data.
I'm going to guess that your data in index 1 has INDEXED_EXTRACTIONS=json
activated in props.conf. More space used in that case is expected behaviour, that space is traded for speed when using those fields - especially in tstats
situations.
To further investigate, run these two searches:
| dbinspect index=index1 | eval rawSizeMB = rawSize / 1048576 | table id eventCount rawSizeMB sizeOnDiskMB
| dbinspect index=index2 | eval rawSizeMB = rawSize / 1048576 | table id eventCount rawSizeMB sizeOnDiskMB
That'll give you the event count, the raw size ingested into each bucket for that index, and how much space each bucket occupies on disk. If you have single huge rogue events you should see one bucket behaving differently from the others, if my JSON guess is correct all buckets for an index should look fairly similar.
As for your events themselves, it seems the data in index 1 has more unique tokens - for example, those huge precision numbers. Lots of unique tokens will increase the size of dictionaries, and hence Splunk's index structures. The index 2 sample events seems to have lots of repeating tokens in the field values, not a lot of unique ones.
The configuration reference is here: http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Propsconf (search for INDEXED_EXTRACTIONS
)
There's a bit of human-readable docs here: docs.splunk.com/Documentation/Splunk/6.2.3/Data/Extractfieldsfromfileheadersatindextime
Regular searches should run at similar speeds. What benefits the most is stuff like this:
| tstats avg(cpu) avg(web_request_response_time) where index=index1 by _time span=auto prestats=t | timechart avg(cpu) avg(web_request_response_time)
That should be massively faster than trying to pry the cpu
and web_request_response_time
fields from the JSON at search time.
You're assumption is correct. So you're saying that the data in index1 can be searched faster?
This data is coming from a custom made script. If the trade off for larger file size is quicker results then I will leave the formatting as is. Otherwise if there were no pros to having the events formatted as such I would change it to be simpler.
Thanks for the heads up. Are there any reference docs available related to this?
Is it possible there are one or two rogue gigantic events in Index 1? I've never used it personally, but I've read of people using "eval esize" to check this kind of thing.
I believe there is a character limit for events. So even if there were a handful of rogue events that still couldn't account for the tenfold size increase.
Ah, I didn't know that actually.
There's some more info in this post here:
http://answers.splunk.com/answers/4162/size-limit-for-an-event.html
Yeah, I immediately looked into that as soon as you mentioned it. That post exactly, actually. Thanks!
how are you calculating "size"?
That is coming from the Indexes view in the Splunk Settings. "Current size in MB"
There are more field extractions occurring in the heavier events. So that could possibly be the case.