Hi, I'd like to get a list of all indexes that shows the data in the following format for a given time span such as last 7 days:
_time indexName IndexedVolumeSizeInMBofTheDay NumOfEventsOfTheDay
2015-11-20 myIndex-A 1234 1000
2015-11-20 myIndex-B 567 300
2015-11-20 myIndex-X 543 250
2015-11-21 myIndex-A 9876 2000
2015-11-21 myIndex-B 3542 341
2015-11-21 myIndex-X 18332 6723
I found the following search on this site, but the output of the list has limited columns, max 13 cols only?, and it doesn't show all indexes. We have over 140+ indexes! Is there a way to make this search list the output in above format or something similar and show all indexes?
index=_internal source=*metrics.log group=per_index_thruput series=* | eval MB = round(kb/1024,2) | timechart sum(MB) as MB by series
Thanks for your help.
The metrics log probably doesn't have the information you need, as it samples the data - it is not complete.
This is not exactly what you asked for, but it is correct and complete. It examines the buckets in each index and calculates the number of events, the size on disk and the raw data size. It will run quickly. If your buckets roll more often than once per day, then this may match a day's worth of data fairly accurately...
| dbinspect index=* | search index!=_*| fields bucketId endEpoch eventCount sizeOnDiskMB startEpoch index rawSize | where endEpoch > relative_time(now(), "-1d@d") | stats min(startEpoch) as startEpoch max(endEpoch) as endEpoch sum(eventCount) as EventCount sum(sizeOnDiskMB) as "Size On Disk (MB)" sum(rawSize) as rSize by index | eval "Raw Data Size (MB)"=round(rSize/1024/1024,2) | eval "Size On Disk (MB)"=round('Size On Disk (MB)',2) | eval "Time Range (hrs)" = round((endEpoch - startEpoch)/3600,2) | eval "End Time"=strftime(endEpoch,"%x %X") | eval "Start Time"=strftime(startEpoch,"%x %X") | table index "Start Time" "End Time" "Time Range (hrs)" EventCount "Raw Data Size (MB)" "Size On Disk (MB)"
Note that the second line is where the actual time range is chosen. The selection says "choose buckets where the latest event in the bucket is within the last day." If you used
startEpoch instead of
endEpoch, Splunk would select only index buckets that had been started within the last day.
HOORAY! UPDATE to the UPDATE!! dbinspect now works in a distributed environment! Yay!
[OLD UPDATE] I dbinspect does not work properly in a distributed environment IN OLDER VERSIONS OF SPLUNK - it needs to be run on each indexer. However there is a answer that addresses this:
Thanks for your explanations lguinn! That helped. The data needs to be within the date/time range specified. Other data points like number of events and Size on Disk are optional for my case. It doesn’t need to match the license usage either.
Actually, the volume size for indexes from the metrics.log would be sufficient for what I need. I’m able to get a report on all indexes by adding the limit=0; without this parameter the report is limited to 10 indexes only.
index=_internal source=*metrics.log group=per_index_thruput series=* | eval MB = round(kb/1024,2) | timechart sum(MB) as MB by series limit=0
Thanks again for your help!
Thanks lguinn. When I used
where startEpoch > relative_time(now(), "-1d@d"), it also returns data indexed today and data from yesterday's and only returns a small set of indexes out of some 50 indexes that have data. How do I define exact From and To Date-Time boundary? Is there a way to list all indexes regardless any data was indexed for that given date/time range?
We have clustered indexers, does the dbinspect command run on a clustered Search Head run against all indexers in the cluster or does the command need to run on each indexer?
I also noticed some SOS and DMC panels on indexes are using _internal *metrics.log. Why would those tools use metrics.log to pull indexes related data if data is not complete as you mentioned?
Yes, the time range for dbinspect cannot be exact. The timerange is used to identify any buckets that have data in the timerange - but the reporting is based on the entire bucket, which can certainly have data outside the timerange. If you use the dbinspect command, there is no way around this.
Many apps (including the DMC) and admins (including me), use the metrics log to get a handle on "what's going on." Looking at the most active data feeds or indexes or whatever is usually all the information that is needed. However, if you have low-volume objects, they will probably not appear in the metrics log. So don't expect this data to be complete - for example, you can't match it to the license usage.
If you are looking for license usage, there is a log for that: license_usage.log
However, it will not tell you everything that you've asked for, such as disk space consumed or number of events per day.