long delay /missing data from sourcetypes?

Jarohnimo · ‎08-08-2019

Hello all,

I have 4 SH, 2 indexer's, 1 Deployment Server in one of my environments (windows).

I'm now noticing that there's a long delay in some of my data showing up when searched on. This is a BIG issue for me as with operations you need to catch thing near real time.

Some items i'm not able to search on until the next day. for example my IIS logs, if i search on the last 15 minutes, maybe 4 out of the 8 Web Servers show as producing logs. If i perform the same search maybe an hour later i'll get 7/8 servers, and hour after that maybe 2/8 servers (so it's sporadic and various). if i search for IIS before 6 hours ago, all is well.

For my IIS indexer
12 CPU, 24GB memory
Indexing rate: around 250 KB/s (status = normal)
Indexing rate every 5 minutes is around 394 KB's

props.conf on indexer
[iis]
TZ = GMT

Index size= 700GB
Max size of Hot/Warm/Cold Bucket set to: auto
Homepath 263/ unlimited
cold 436/ unlimited

The highest host IIS Log Event Count: 343,166,069
by sourcetype (iis) 1,74,31,09,978

Maxdatasize auto
maxhotbuckets 3
maxwarmdbcount 300

Splunk Data Piple line is 0% across the board and show's no delays.

I noticed under the index Detail: instance my cold buckets size was much larger than my hot/warm buckets also

splunkcol · ‎10-01-2020

How have you solved it?

richgalloway · ‎08-08-2019

Have you verified all of the IIS servers have the correct time and time zone?
When you compare _time to _indextime, what do you see?

| tstats latest(_time) AS _time latest(_indextime) AS _indextime where index=iis by host 
| eval delta=_indextime - _time 
| where delta != 0 
| eval indexTime=_indextime 
| fields delta indexTime _time host
| sort - delta 
| eval indexTime=strftime(indexTime, "%F %T") 
| eval Time=strftime(_time, "%F %T")
| table delta indexTime Time host

---
If this reply helps you, Karma would be appreciated.

Jarohnimo · ‎08-08-2019

Yes, the timestamp on all the IIS servers look fine. They are in UTC and as stated in the OP I've added a props.conf entry for that sourcetype that normalized the data. If I do a search on future logs nothing is returned so I'm not of the impression it's a timestamp issue.

One thing I meant to mention i discovered leaving out work, another log source is also delayed. Both of these logs are the biggest logs source I'm pulling

However smaller logs and sources still come through

I'm starting to think I'm hitting my limit in limits.conf.

richgalloway · ‎08-09-2019

Putting TZ = GMT in props.conf does not normalize data. It's merely information to help indexers parse timestamps. If the timestamp is not in UTC, TZ = GMT will result in events being out of sequence.

Are the logs being sent by a forwarder? If so, consider increasing the maxKBps setting in the forwarder's limits.conf file.

Depending on what else the indexer is doing, 250GB/day is near the limit of what can be expected from a single indexer. If you can't increase the storage I/O rate then consider adding an indexer.

---
If this reply helps you, Karma would be appreciated.

Jarohnimo · ‎08-09-2019

I meant normalize the data in respect to the timestamp, I should of been clearer. Generally I do my Field extractions at search time on the search heads only.

You may have missed that I have 2 indexers currently so one indxer is getting half this amount so I don't think it's the indexers.. no issues with the data pipeline.. i'm thinking limit.conf is probably where I need to concentrate.

Today I evaluated my actual logs and see someone doing something crazy with web Api calls that have more than quadrupled the log size. So I'll have them stop what they are doing first and look into the limit.conf at the same time

Thanks for your help

Jarohnimo · ‎08-08-2019

Maybe I'm ignorant to the idea of me hitting any limits as I'm only ingesting 250gb daily and I know of plenty who pull TB's of data a day. Perhaps they've adjusted their limits.conf to allow the data to flow or perhaps they are pulling from 1,000 devices to = that 1tb and no individual node is reaching the default limit in limits.conf where I'm only pulling from 84 devices = 250gb's?

I definitely need to fix this problem asap!

long delay /missing data from sourcetypes?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Federated Search for Snowflake Is Now Generally Available on Splunk Cloud Platform

Help Us Build Better Splunk Regex Puzzles (And Win Prizes!)

Fuel Your Journey: What’s Waiting for You at the .conf26 Acceleration Station

Join the Conversation

long delay /missing data from sourcetypes?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Federated Search for Snowflake Is Now Generally Available on Splunk Cloud Platform

Help Us Build Better Splunk Regex Puzzles (And Win Prizes!)

Fuel Your Journey: What’s Waiting for You at the .conf26 Acceleration Station