Deployment Architecture

HF all queues 100%

lightech1
Path Finder

Hello Everyone.

Previous to muy question, a litle context first:

SOME INPUTS / UF ---> Heavy fowarder ---> VPN TUNNEL (iNTERNET) ---> INDEXER

alt text

The HF that have al queues at 100% (not do any parsing only deliver the logs to the indexer), but some thing strange is that i need to install all the addons to see the events parse in the Search Head, this is weird, right?

So the things is to discovery the cause of that behavior:

1) First Theory:

We found errors on firewall logs (timeout fragment packets) on the side of the indexer., so I could produce some perfomance issues that the HF needs to retrasmit all the logs again , so the queues goes to 100%.

I need to know if there is a best practice to mitigate that, maybe a configuration on the VPN or something else... but in my opinion this issue is for the diferents MTU values on the devices that are in the cloud.

2) second theory

We have somo hardware limitations, so it could be that too.

Additional questions:
Why In need to install some addons in the HF to get things parsed?
in a Heavy fowarder that not do any parsing/indexing , the parsing/indexing queue ratio fases should be in 0%}?

Thanks very much!

Tags (1)
1 Solution

esix_splunk
Splunk Employee
Splunk Employee

Addressing two questions here...

1) Do you need to put the TA's on the HFs. The answer will be it depends on what the Data Source is, and what you are doing on the Heavy Forwarder. If you're not filtering / routing / changing meta data, or just passing the data streams through the HF. then you dont need the TA's on the HFs.

Alot of the TA's are Search Time Knowledge objects, meaning there is no need to have them installed anywhere but the search head(s). Refer to the documentation for the TA to see where it should be delpoyed.

2) Backed up queues...
So how stable is your VPN tunnel, and how much bandwidth do you have? I am going to assume that the indexers have been checked and that the queues are fine there (Check this, and confirm.)
After that, you need to understand that if Splunk cant send TCPOut (indexing queue..) It will hold the data and keep trying. Once that queue is filled, it back pressures against the Typing, then the Aggregation, and then the Parsing Queues. So if you really are not doing any filtering on your Heavies, I would say you're network connectivity would be one of the primary areas I would look.

After this, hardware of course should be checked. Make sure your cPU and Memory resources are not quenched on the forwarders..

And regarding MTU, unless your network traffic is generally in the same data center, you're going to be bound by 1500mtu.. Jumbo frames over the internet really isnt a thing in most markets.

View solution in original post

0 Karma

woodcock
Esteemed Legend

When queues fill in the middle is either because there is a large volume of data and the pipe is throttled. So did you set this in limits.conf?

[thruput]
maxKBps = 0

If you did then the problem is that the indexers are not processing incoming data. If this is the case, the cause will be clearly indicated in the logs and also clearly indicated by running a Health Check. One common cause is that you did not leave 5000MB free on your hot storage. This is required by the indexers in the hot volume, or Indexing will stop. Run a Health Check on the MC and see what it says.

0 Karma

lightech1
Path Finder

Hi Woodcock,

Thanks for your response. I havent set that parameter.

The indexer have 90 percent of free space.

0 Karma

woodcock
Esteemed Legend

Your HF absolutely must have thruput expanded from default.

0 Karma

MuS
SplunkTrust
SplunkTrust

Nope, [thruput] by default is unlimited on full Splunk instances, only the UF has a limit - from the docs http://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf

[thruput]
maxKBps = <integer>
* The maximum speed, in kilobytes per second, that incoming data is 
  processed through the thruput processor in the ingestion pipeline.
* To control the CPU load while indexing, use this setting to throttle
  the number of events this indexer processes to the rate (in
  kilobytes per second) that you specify.
* NOTE:
  * There is no guarantee that the thruput processor 
    will always process less than the number of kilobytes per
    second that you specify with this setting. The status of 
    earlier processing queues in the pipeline can cause
    temporary bursts of network activity that exceed what
    is configured in the setting. 
  * The setting does not limit the amount of data that is 
    written to the network from the tcpoutput processor, such 
    as what happens when a universal forwarder sends data to 
    an indexer.  
  * The thruput processor applies the 'maxKBps' setting for each
    ingestion pipeline. If you configure multiple ingestion
    pipelines, the processor multiplies the 'maxKBps' value
    by the number of ingestion pipelines that you have
    configured.
  * For more information about multiple ingestion pipelines, see 
    the 'parallelIngestionPipelines' setting in the 
    server.conf.spec file.
* Default (Splunk Enterprise): 0 (unlimited)
* Default (Splunk Universal Forwarder): 256
0 Karma

lightech1
Path Finder

Sorry, missunderstanding.

I said that i dont change that parameter in the HF either on the indexer. On the indexers I not see any errors on health check, all the queues are un 0%, so I think that the problem is before the data have arrived to the indexer (network).

Thanks!

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Addressing two questions here...

1) Do you need to put the TA's on the HFs. The answer will be it depends on what the Data Source is, and what you are doing on the Heavy Forwarder. If you're not filtering / routing / changing meta data, or just passing the data streams through the HF. then you dont need the TA's on the HFs.

Alot of the TA's are Search Time Knowledge objects, meaning there is no need to have them installed anywhere but the search head(s). Refer to the documentation for the TA to see where it should be delpoyed.

2) Backed up queues...
So how stable is your VPN tunnel, and how much bandwidth do you have? I am going to assume that the indexers have been checked and that the queues are fine there (Check this, and confirm.)
After that, you need to understand that if Splunk cant send TCPOut (indexing queue..) It will hold the data and keep trying. Once that queue is filled, it back pressures against the Typing, then the Aggregation, and then the Parsing Queues. So if you really are not doing any filtering on your Heavies, I would say you're network connectivity would be one of the primary areas I would look.

After this, hardware of course should be checked. Make sure your cPU and Memory resources are not quenched on the forwarders..

And regarding MTU, unless your network traffic is generally in the same data center, you're going to be bound by 1500mtu.. Jumbo frames over the internet really isnt a thing in most markets.

0 Karma

lightech1
Path Finder

helo esix,

Thanks you for your reply, I answer point by point:

1)
Yes, you are right. The thing is that the HF dont do any parsing at all, but if I not installed the addons, the data not have parsed (Instead of installing the addon on de Indexers and the searchhead), so when we have installed the addons on the heavy fowarders, its OK. So, this is ver weird maybe I need to check the configuration on HF but like I said to nittala the index and foward parameter is set to false...

2)
I have tested the bandwitch and its ok, no problem of saturation, etc.
The indexer queues are fine. I have checked.

Regarding to this phrase:

After that, you need to understand that if Splunk cant send TCPOut (indexing queue..) It will hold the data and keep trying. Once that queue is filled, it back pressures against the Typing, then the Aggregation, and then the Parsing Queues. So if you really are not doing any filtering on your Heavies, I would say you're network connectivity would be one of the primary areas I would look.

I am agree with you. because I have this messages on heavy fowarder:

07-17-2018 11:33:58.136 -0300 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group default-autolb-group has been blocked for 100 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
07-17-2018 11:35:38.963 -0300 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group default-autolb-group has been blocked for 200 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
07-17-2018 11:38:12.260 -0300 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group default-autolb-group has been blocked for 100 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

0 Karma

sudosplunk
Motivator

Hello lightech1,

Heavy forwarders have capabilities to parse and route data before indexing. Unlike other forwarder types, a heavy forwarder parses data before forwarding it and can route data based on criteria such as source or type of event. It can also index data locally while forwarding the data to another indexer. Please refer to splunk docs for more information. HTH!

0 Karma

lightech1
Path Finder

Hello Nittala,

Thanks you for your reply.

Your are right, I know, so i use de HF only to deliver the data (Not doing any parsing ), but when we configured the arquitecture, the logs not parsing if I not install the addons on the HF instead of The index and foward parameter is set to false..

0 Karma

ddrillic
Ultra Champion

How large are these queues?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...