Summary
For monitoring Windows directories, Splunk is reporting roughly 30 times the index volume versus the actual file contents themselves.
Details
I have a directory indexed by Splunk.
PS C:\logs> cat 'C:\Program Files\SplunkUniversalForwarder\etc\apps\search\local\inputs.conf'
[monitor://C:\logs]
disabled = false
This has only logfiles like the following - going back a month:
PS C:\logs> get-childItem . | select Name,FileSize,length
Name FileSize Length
---- -------- ------
x-log-2018-07-01.H00.txt 297 B 297
[snip]
x-log-2018-07-30.H03.txt {22.54 MB , 23,081.98 KB } 23635943
x-log-2018-07-30.H04.txt {45.31 MB , 46,398.05 KB } 47511605
x-log-2018-07-30.H05.txt {47.31 MB , 48,448.86 KB } 49611636
x-log-2018-07-30.H06.txt {31.69 MB , 32,454.59 KB } 33233498
x-log-2018-07-30.H07.txt {21.16 MB , 21,670.09 KB } 22190177
x-log-2018-07-30.H08.txt {12.32 MB , 12,620.59 KB } 12923489
x-log-2018-07-30.H09.txt {9.03 MB , 9,245.70 KB } 9467595
x-log-2018-07-30.H10.txt {15.87 MB , 16,254.70 KB } 16644816
x-log-2018-07-30.H11.txt {48.31 MB , 49,470.80 KB } 50658101
x-log-2018-07-30.H12.txt {5.46 MB , 5,595.05 KB } 5729335
x-log-2018-07-30.H13.txt {37.36 MB , 38,260.37 KB } 39178621
x-log-2018-07-30.H14.txt {34.75 MB , 35,584.42 KB } 36438450
x-log-2018-07-30.H15.txt {13.91 MB , 14,244.40 KB } 14586261
x-log-2018-07-30.H16.txt {12.41 MB , 12,703.72 KB } 13008605
x-log-2018-07-30.H17.txt {8.41 MB , 8,611.08 KB } 8817743
x-log-2018-07-30.H18.txt {6.43 MB , 6,588.22 KB } 6746340
x-log-2018-07-30.H19.txt {24.83 MB , 25,424.69 KB } 26034884
x-log-2018-07-30.H20.txt {24.60 MB , 25,194.88 KB } 25799554
x-log-2018-07-30.H21.txt {48.48 MB , 49,643.52 KB } 50834964
A new file is created every hour, and logs are appended to it. So we have 24 files per day.
This understandably resulted in a very high initial index volume when we added the directory. OK, fine. But when I view the Data Volume Calculator - Max Sources, I find that Splunk sees each of these logfiles as ~700 MiB in size, when they are only ~20 MiB each. This results in ~15 GiB/day license usage - from just this host! Multiplied across our whole fleet of Windows machines with this logging pattern, we would need to purchase a license far more expensive than necessary for the actual indexed data volume.
That a daily license usage equivalent to the *entire month's worth of logs in the directory!
PS C:\logs> "{0:N2} MB" -f ((Get-ChildItem -Recurse | Measure-Object -Property Length -Sum -ErrorAction Stop).Sum / 1MB)
14,203.00 MB
Question
Is there anything obvious I am missing here? Is there any debugging I can do to investigate this? I am currently locked out of search, but I can view the Splunk _internal index.
Edit: Minor update: I've regained full search capabilities through contact with Splunk Inc as we sort this out.
... View more