Getting Data In

How can we find out where the delay in indexing is?

ddrillic
Ultra Champion

We have the following search -

base search
| eval diff= _indextime - _time 
| eval capturetime=strftime(_time,"%Y-%m-%d %H:%M:%S") 
| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") 
| table capturetime indextime  diff

We see the following -

alt text

So, we see a delay of over five hours in indexing. Is there a way to find out where these events "got stuck"? In this case, these events are coming from hadoop servers and the forwarder processes around 1/2 million files. We would like to know whether the delay is at the forwarder level or on the indexer side.

Tags (3)

ddrillic
Ultra Champion

Hi @rdagan,

We had a production change on Wednesday night. On the following day, Thursday, we saw this delay in indexing -

base query 

followed by -

alt text

On Friday there was no delay (the right column) -

alt text

And we saw this behavior before on other production changes involving this large hadoop file systems. So, I think that it takes the forwarder hours to scan this large number of files and index the right information, a day or two later all is fine. Just checked it now and it's perfect. So, the delay's time frame is around the forwarder bounce time.

The thing is - what can we improve on the forwarder to lower this delay after the bounce?

On the forwarder we see -

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1033069
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 64000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

And thank you @MuS and @somesoni2 for validating that nothing is fundamentally wrong with either the forwarder's configuration or the index queues...

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Do you see any helpful information in this Management Console dashboard?
Indexing Pipeline: http://docs.splunk.com/Documentation/Splunk/6.6.0/DMC/IndexingInstance
Forwarders: http://docs.splunk.com/Documentation/Splunk/6.6.0/DMC/ForwardersDeployment

0 Karma

MuS
SplunkTrust
SplunkTrust

Just to clarify, did you check there is no maxKBps = <some Number other than 0> option set in limits.conf on the UF?

0 Karma

ddrillic
Ultra Champion

ok, I see -

$ find . -name "limits.conf"       | xargs grep -i maxKBps
./etc/apps/universal_config_forwarder/local/limits.conf:maxKBps = 0
./etc/apps/SplunkUniversalForwarder/default/limits.conf:maxKBps = 256
./etc/system/default/limits.conf:maxKBps = 0
0 Karma

MuS
SplunkTrust
SplunkTrust

use this command to show what is actually applied as config:

 splunk btool limits list thruput

that is on the forwarder. But by looks of it, you have no limit active ... Did you check DMC / MC for any blocked queues?

ddrillic
Ultra Champion

Great. It shows -

$ ./splunk btool limits list thruput
[thruput]
maxKBps = 0
0 Karma

somesoni2
SplunkTrust
SplunkTrust

I would run a btool command to check which setting is applied. (system/default has lowest priority).

bin/splunk btool limits list --debug | grep maxKBps
0 Karma

ddrillic
Ultra Champion

right - that's what I did...

0 Karma

somesoni2
SplunkTrust
SplunkTrust

I was late/early on that. Check the various queue sizes if there is any high spikes on the queue sizes.

index=_internal sourcetype=splunkd source=*metrics.log group=queue 
| timechart avg(current_size) by name

You can add host=yourUFName to see queue sizes on UF and host=Indexer (add more OR condition for all indexers) to see queue sizes on Indexers. You may need to adjust queue sizes based on results from there. https://answers.splunk.com/answers/38218/universal-forwarder-parsingqueue-kb-size.html

ddrillic
Ultra Champion

Great. I see the following -

alt text

0 Karma

somesoni2
SplunkTrust
SplunkTrust

The aggQueue is where date parsing and line merging happens. This suggest that there may be in-efficient event parsing configuration setup. What is the sourcetype definition (props.conf on indexers) you've for sourcetypes involved?

0 Karma

uagraw01
Builder

 

 

 

0 Karma

ddrillic
Ultra Champion

Interesting - this sourcetype doesn't show up in in props.conf...

0 Karma

somesoni2
SplunkTrust
SplunkTrust

It means there is no config setup and Splunk has to figure everything out, hence the spikes. I would suggest defining an efficient line breaking and event parsing for this data and get it deployed on Indexers (would need to restart indexers). I hope you'd see lower latency/queue sizes after that. If you could share some sample raw events, we can suggest some.

ddrillic
Ultra Champion

perfect - I'll work on it.

0 Karma

ddrillic
Ultra Champion

and then -

$ ./splunk btool --debug limits list | grep maxKBp
/opt/splunk/splunkforwarder/etc/apps/universal_config_forwarder/local/limits.conf maxKBps = 0
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...