How can I determine which forwarder is impacting the indexer the most?
I have an index taking up 53 gigs of space with an event count of 296 million.
There are multiple forwarders feeding into this index.
The forwarders with the most events have directories that are less than 2 gigs in size.
I am manually going server to server to try and determine 'what is using all the space?'
Hi, the internal index should have this information. You can try something like:
`index=internal metrics "group=tcpin_connections" `
"sourceHost" would be the forwarder... you extract per your requirement. Eg
index=_internal earliest=-15m metrics "group=tcpin_connections"|stats sum(tcp_Kprocessed) by sourceHost
or tcp_eps (check out the docs for additional options)
You may also get what you need from something like this:
| metadata type=hosts index=
It will return a totalCount column per host. If your forwarders are the original source of your log events, the event count should accurately reflect what's coming from each forwarder.
If you are using 6.2 or higher you can use DMC (Distributed Management Console). Here is the raw search you may want
index=_internal host=lyn-del-spl-101 source="*metrics.log" sourcetype=splunkd group=per_host_thruput | timechart per_second(kb) as per_second sum(kb) as kb by series useother=false limit=15
To clarify I'm interested in the data versus the events. Can a filepath containing 2 gigs spawn 10 gigs of disk space on an index?
In this case 12 hosts are involved. Each apparently has files that total about 2 gigs. So at best I would expect that the index size on disk would be about 16 gigs on a daily basis.
Putting it another way, SoS reports that this index is consuming on average 30 gigs of data per day. Where is the other 14 gigs coming from?
it is technically possible, but seems a bit out of the ordinary. Raw data usually compresses very nicely on disk, we frequently see compression rates beyond 75%.
The remaining disk space is used for the indexes and metadata that go along with the raw data. If you, for example, configured INDEXED_EXTRACTIONS = json/xml/etc. and you have very high cardinality in your source data, the size of the index files can quickly exceed the raw data size on disk.
In other words: We need a bit more details on how you have your inputs configured for these 12 hosts.
You can also look at the directory structure on the indexer to see if you have multiple large .tsidx files.
Hope this helps.
Turns out the individual that set up the monitoring on these twelve servers didn't exclude log rolling.
I'm going to clean up the monitoring and if the issue persists i will seek assistance another day.
Thanks for the answers and support.