we have indexers which are running in clustered environment.we have retention policy 35 days for the all app logs. Now we started missing data.now we see only 10 days of old data.and data missing continuously happening.could you please suggest how to investigate this issue.
Do you have enough disk space to accommodate 35 days worth of data? Do you have volume settings that allow you to consume that disk space for Splunk's use? Check out your Monitoring Console to be sure.
index=_internal component=BucketMover idx=YourIndexName
Look for when the data is being rolled with the search above. See if there are any errors such as "storage full" or "out of disk", or "permission denied", etc.
i am not seeing any mentioned error. we are suspecting data loss due to the restricted volume on indexer db.below are the configuration.how we can identify that data is getting overidden with earliest data inplace of oldest data.can you please help
figure out which pipeline is full using the monitoring console.
Look at indexing performance searches, and it should show you the pipelines.
Could be bad parsing, or it might be time to add indexers. How many indexers do you have now and what IOPS storage do you have? To do 2.4TB/day with reference hardware, you'd need about 10 indexers just to handle the input.
@shivanandbm
please see my comment above.
fix your indexes.conf according to your needs
Thank you . i am searching logs which tells me my data is getting overwrite.could you pleas tell which logs tells me.
Splunk will crash before it overwrites.
It’s called bucket collision and it’s very bad.
Try:
index=_internal sourcetype=splunkd source=*splunkd.log "BucketMover - will attempt to freeze" NOT "because frozenTimePeriodInSecs="
| rex field=bkt "(rb_|db_)(?P<newestDataInBucket>\d+)_(?P<oldestDataInBucket>\d+)"
| eval newestDataInBucket=strftime(newestDataInBucket, "%+"), oldestDataInBucket = strftime(oldestDataInBucket, "%+")
| table message, oldestDataInBucket, newestDataInBucket
That is IndexerLevel - Buckets are been frozen due to index sizing from git or the Alerts for Splunk Admins app
not getting any output for this query
Good, that query advises that buckets are frozen because of size limits. So no results is a good thing
i am searching logs which tells me my data is getting overwrite.could you pleas tell which log tells me.i am sure that latest data is over written by old data
I don't see you specifying how much app data you ingest on a daily basis.
i have report. in which i see 2437GB worth data inducing every week.
ill start by checking the size of your indexes (and even your indexers disk) Splunk will apply retention or size policy, whatever comes first. so if lets say you have 100gb disk available on Indexers and you are indexing 10 GB per day, you will only have retention for 10 days (here its simplified, not calculating compression).
so, even if you set your indx time retention to 300 days, it can not hold enough data to keep it
Thank you for reply.each indexes are assigned with 500GB and total we have 43 indexers. retention policy is 35 days.having said that we have 630 GB of limit on the total size of data model acceleration (DMA).below are the valume settings in indexers.conf.
[volume:hot]
path = /SplunkIndexes/HotWarmIndex
maxVolumeDataSizeMB = 130000
[volume:cold]
path = /SplunkIndexes/ColdIndex
maxVolumeDataSizeMB = 500000
below is the total disk space consumed on indexers.
/dev/mapper/vgsplunkssd-lvsplunkssd
4.8T 128G 4.5T 3% /SplunkIndexes/HotWarmIndex
/dev/mapper/vgsplunksata-lvsplunksata
14T 489G 13T 4% /SplunkIndexes/ColdIndex
can you please suggest us whether we are assigned less amount of volume so in result we are seeing data loss.also suggest us how we can avoide the data loss .we should have 35 days of data as per the requirement. does increasing the volumes fix the issue? please help us in this case.
yes,
you really are defining tiny volumes across your 43 indexers
~130GB for hot/war volume
~500GB for cold volume
that explains why you barely see any data on your df
outputs
read here all the way though and modify your indexes.conf accordingly
https://docs.splunk.com/Documentation/Splunk/7.1.2/Indexer/Configureindexstoragesize
also try and run this search to see what Splunk tells you as the reason for rolling and verify your configurations down the road:
index=_internal sourcetype=splunkd component=BucketMover
hope it helps
Thanks once again. when i ran the below query i see the below output.
index=_internal sourcetype=splunkd component=BucketMover | timechart span=1d count by component
Can you please tell what it indicates
_time BucketMover
2018-08-30 257
2018-08-31 2039
2018-09-01 1725
2018-09-02 1631
2018-09-03 1989
2018-09-04 1858
2018-09-05 1968
2018-09-06 1850
2018-09-07 1754
2018-09-08 1639
2018-09-09 226