Developing for Splunk Enterprise

Data loss in splunlk

shivanandbm
New Member

we have indexers which are running in clustered environment.we have retention policy 35 days for the all app logs. Now we started missing data.now we see only 10 days of old data.and data missing continuously happening.could you please suggest how to investigate this issue.

Tags (1)
0 Karma

woodcock
Esteemed Legend

Do you have enough disk space to accommodate 35 days worth of data? Do you have volume settings that allow you to consume that disk space for Splunk's use? Check out your Monitoring Console to be sure.

0 Karma

jkat54
SplunkTrust
SplunkTrust

index=_internal component=BucketMover idx=YourIndexName

Look for when the data is being rolled with the search above. See if there are any errors such as "storage full" or "out of disk", or "permission denied", etc.

0 Karma

shivanandbm
New Member

i am not seeing any mentioned error. we are suspecting data loss due to the restricted volume on indexer db.below are the configuration.how we can identify that data is getting overidden with earliest data inplace of oldest data.can you please help

0 Karma

jkat54
SplunkTrust
SplunkTrust

figure out which pipeline is full using the monitoring console.

Look at indexing performance searches, and it should show you the pipelines.

Could be bad parsing, or it might be time to add indexers. How many indexers do you have now and what IOPS storage do you have? To do 2.4TB/day with reference hardware, you'd need about 10 indexers just to handle the input.

0 Karma

adonio
Ultra Champion

@shivanandbm
please see my comment above.
fix your indexes.conf according to your needs

0 Karma

shivanandbm
New Member

Thank you . i am searching logs which tells me my data is getting overwrite.could you pleas tell which logs tells me.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Splunk will crash before it overwrites.
It’s called bucket collision and it’s very bad.

0 Karma

gjanders
SplunkTrust
SplunkTrust

Try:

index=_internal sourcetype=splunkd source=*splunkd.log "BucketMover - will attempt to freeze" NOT "because frozenTimePeriodInSecs=" 
| rex field=bkt "(rb_|db_)(?P<newestDataInBucket>\d+)_(?P<oldestDataInBucket>\d+)"
| eval newestDataInBucket=strftime(newestDataInBucket, "%+"), oldestDataInBucket = strftime(oldestDataInBucket, "%+") 
| table message, oldestDataInBucket, newestDataInBucket

That is IndexerLevel - Buckets are been frozen due to index sizing from git or the Alerts for Splunk Admins app

0 Karma

shivanandbm
New Member

not getting any output for this query

0 Karma

gjanders
SplunkTrust
SplunkTrust

Good, that query advises that buckets are frozen because of size limits. So no results is a good thing

0 Karma

shivanandbm
New Member

i am searching logs which tells me my data is getting overwrite.could you pleas tell which log tells me.i am sure that latest data is over written by old data

0 Karma

Rob2520
Communicator

I don't see you specifying how much app data you ingest on a daily basis.

0 Karma

shivanandbm
New Member

i have report. in which i see 2437GB worth data inducing every week.

0 Karma

adonio
Ultra Champion

ill start by checking the size of your indexes (and even your indexers disk) Splunk will apply retention or size policy, whatever comes first. so if lets say you have 100gb disk available on Indexers and you are indexing 10 GB per day, you will only have retention for 10 days (here its simplified, not calculating compression).
so, even if you set your indx time retention to 300 days, it can not hold enough data to keep it

0 Karma

shivanandbm
New Member

Thank you for reply.each indexes are assigned with 500GB and total we have 43 indexers. retention policy is 35 days.having said that we have 630 GB of limit on the total size of data model acceleration (DMA).below are the valume settings in indexers.conf.

VOLUME SETTINGS

[volume:hot]
path = /SplunkIndexes/HotWarmIndex
maxVolumeDataSizeMB = 130000
[volume:cold]
path = /SplunkIndexes/ColdIndex
maxVolumeDataSizeMB = 500000

below is the total disk space consumed on indexers.

/dev/mapper/vgsplunkssd-lvsplunkssd
4.8T 128G 4.5T 3% /SplunkIndexes/HotWarmIndex
/dev/mapper/vgsplunksata-lvsplunksata
14T 489G 13T 4% /SplunkIndexes/ColdIndex
can you please suggest us whether we are assigned less amount of volume so in result we are seeing data loss.also suggest us how we can avoide the data loss .we should have 35 days of data as per the requirement. does increasing the volumes fix the issue? please help us in this case.

0 Karma

adonio
Ultra Champion

yes,
you really are defining tiny volumes across your 43 indexers
~130GB for hot/war volume
~500GB for cold volume
that explains why you barely see any data on your df outputs
read here all the way though and modify your indexes.conf accordingly
https://docs.splunk.com/Documentation/Splunk/7.1.2/Indexer/Configureindexstoragesize
also try and run this search to see what Splunk tells you as the reason for rolling and verify your configurations down the road:
index=_internal sourcetype=splunkd component=BucketMover

hope it helps

0 Karma

shivanandbm
New Member

Thanks once again. when i ran the below query i see the below output.
index=_internal sourcetype=splunkd component=BucketMover | timechart span=1d count by component

Can you please tell what it indicates

_time BucketMover

2018-08-30 257
2018-08-31 2039
2018-09-01 1725
2018-09-02 1631
2018-09-03 1989
2018-09-04 1858
2018-09-05 1968
2018-09-06 1850
2018-09-07 1754
2018-09-08 1639
2018-09-09 226

0 Karma
*NEW* Splunk Love Promo!
Snag a $25 Visa Gift Card for Giving Your Review!

It's another Splunk Love Special! For a limited time, you can review one of our select Splunk products through Gartner Peer Insights and receive a $25 Visa gift card!

Review:





Or Learn More in Our Blog >>