Hello Everyone.
Previous to muy question, a litle context first:
SOME INPUTS / UF ---> Heavy fowarder ---> VPN TUNNEL (iNTERNET) ---> INDEXER
The HF that have al queues at 100% (not do any parsing only deliver the logs to the indexer), but some thing strange is that i need to install all the addons to see the events parse in the Search Head, this is weird, right?
So the things is to discovery the cause of that behavior:
1) First Theory:
We found errors on firewall logs (timeout fragment packets) on the side of the indexer., so I could produce some perfomance issues that the HF needs to retrasmit all the logs again , so the queues goes to 100%.
I need to know if there is a best practice to mitigate that, maybe a configuration on the VPN or something else... but in my opinion this issue is for the diferents MTU values on the devices that are in the cloud.
2) second theory
We have somo hardware limitations, so it could be that too.
Additional questions:
Why In need to install some addons in the HF to get things parsed?
in a Heavy fowarder that not do any parsing/indexing , the parsing/indexing queue ratio fases should be in 0%}?
Thanks very much!
Addressing two questions here...
1) Do you need to put the TA's on the HFs. The answer will be it depends on what the Data Source is, and what you are doing on the Heavy Forwarder. If you're not filtering / routing / changing meta data, or just passing the data streams through the HF. then you dont need the TA's on the HFs.
Alot of the TA's are Search Time Knowledge objects, meaning there is no need to have them installed anywhere but the search head(s). Refer to the documentation for the TA to see where it should be delpoyed.
2) Backed up queues...
So how stable is your VPN tunnel, and how much bandwidth do you have? I am going to assume that the indexers have been checked and that the queues are fine there (Check this, and confirm.)
After that, you need to understand that if Splunk cant send TCPOut (indexing queue..) It will hold the data and keep trying. Once that queue is filled, it back pressures against the Typing, then the Aggregation, and then the Parsing Queues. So if you really are not doing any filtering on your Heavies, I would say you're network connectivity would be one of the primary areas I would look.
After this, hardware of course should be checked. Make sure your cPU and Memory resources are not quenched on the forwarders..
And regarding MTU, unless your network traffic is generally in the same data center, you're going to be bound by 1500mtu.. Jumbo frames over the internet really isnt a thing in most markets.
When queues fill in the middle is either because there is a large volume of data and the pipe is throttled. So did you set this in limits.conf?
[thruput]
maxKBps = 0
If you did then the problem is that the indexers are not processing incoming data. If this is the case, the cause will be clearly indicated in the logs and also clearly indicated by running a Health Check. One common cause is that you did not leave 5000MB free on your hot storage. This is required by the indexers in the hot volume, or Indexing will stop. Run a Health Check on the MC and see what it says.
Hi Woodcock,
Thanks for your response. I havent set that parameter.
The indexer have 90 percent of free space.
Your HF absolutely must have thruput
expanded from default.
Nope, [thruput]
by default is unlimited on full Splunk instances, only the UF has a limit - from the docs http://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf
[thruput]
maxKBps = <integer>
* The maximum speed, in kilobytes per second, that incoming data is
processed through the thruput processor in the ingestion pipeline.
* To control the CPU load while indexing, use this setting to throttle
the number of events this indexer processes to the rate (in
kilobytes per second) that you specify.
* NOTE:
* There is no guarantee that the thruput processor
will always process less than the number of kilobytes per
second that you specify with this setting. The status of
earlier processing queues in the pipeline can cause
temporary bursts of network activity that exceed what
is configured in the setting.
* The setting does not limit the amount of data that is
written to the network from the tcpoutput processor, such
as what happens when a universal forwarder sends data to
an indexer.
* The thruput processor applies the 'maxKBps' setting for each
ingestion pipeline. If you configure multiple ingestion
pipelines, the processor multiplies the 'maxKBps' value
by the number of ingestion pipelines that you have
configured.
* For more information about multiple ingestion pipelines, see
the 'parallelIngestionPipelines' setting in the
server.conf.spec file.
* Default (Splunk Enterprise): 0 (unlimited)
* Default (Splunk Universal Forwarder): 256
Sorry, missunderstanding.
I said that i dont change that parameter in the HF either on the indexer. On the indexers I not see any errors on health check, all the queues are un 0%, so I think that the problem is before the data have arrived to the indexer (network).
Thanks!
Addressing two questions here...
1) Do you need to put the TA's on the HFs. The answer will be it depends on what the Data Source is, and what you are doing on the Heavy Forwarder. If you're not filtering / routing / changing meta data, or just passing the data streams through the HF. then you dont need the TA's on the HFs.
Alot of the TA's are Search Time Knowledge objects, meaning there is no need to have them installed anywhere but the search head(s). Refer to the documentation for the TA to see where it should be delpoyed.
2) Backed up queues...
So how stable is your VPN tunnel, and how much bandwidth do you have? I am going to assume that the indexers have been checked and that the queues are fine there (Check this, and confirm.)
After that, you need to understand that if Splunk cant send TCPOut (indexing queue..) It will hold the data and keep trying. Once that queue is filled, it back pressures against the Typing, then the Aggregation, and then the Parsing Queues. So if you really are not doing any filtering on your Heavies, I would say you're network connectivity would be one of the primary areas I would look.
After this, hardware of course should be checked. Make sure your cPU and Memory resources are not quenched on the forwarders..
And regarding MTU, unless your network traffic is generally in the same data center, you're going to be bound by 1500mtu.. Jumbo frames over the internet really isnt a thing in most markets.
helo esix,
Thanks you for your reply, I answer point by point:
1)
Yes, you are right. The thing is that the HF dont do any parsing at all, but if I not installed the addons, the data not have parsed (Instead of installing the addon on de Indexers and the searchhead), so when we have installed the addons on the heavy fowarders, its OK. So, this is ver weird maybe I need to check the configuration on HF but like I said to nittala the index and foward parameter is set to false...
2)
I have tested the bandwitch and its ok, no problem of saturation, etc.
The indexer queues are fine. I have checked.
Regarding to this phrase:
After that, you need to understand that if Splunk cant send TCPOut (indexing queue..) It will hold the data and keep trying. Once that queue is filled, it back pressures against the Typing, then the Aggregation, and then the Parsing Queues. So if you really are not doing any filtering on your Heavies, I would say you're network connectivity would be one of the primary areas I would look.
I am agree with you. because I have this messages on heavy fowarder:
07-17-2018 11:33:58.136 -0300 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group default-autolb-group has been blocked for 100 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
07-17-2018 11:35:38.963 -0300 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group default-autolb-group has been blocked for 200 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
07-17-2018 11:38:12.260 -0300 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group default-autolb-group has been blocked for 100 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
Hello lightech1,
Heavy forwarders have capabilities to parse and route data before indexing. Unlike other forwarder types, a heavy forwarder parses data before forwarding it and can route data based on criteria such as source or type of event. It can also index data locally while forwarding the data to another indexer. Please refer to splunk docs for more information. HTH!
Hello Nittala,
Thanks you for your reply.
Your are right, I know, so i use de HF only to deliver the data (Not doing any parsing ), but when we configured the arquitecture, the logs not parsing if I not install the addons on the HF instead of The index and foward parameter is set to false..
How large are these queues?