Deployment Architecture

loosing events from light forwarder

ron45
Explorer

Hi there

We use a configuration of splunk light forwarders and a splunk index server. The light forwarders using tcp to write data to the index server. Now we've encountered, some lines of logfiles are missing on index side. Splunk logfiles (on light forwarder and index server) don't show any errors.

An example of lost data is:
Original Logfile:

06.01.2011 05:10:46 INFO Worldcheck import: 6,000 (22/sec)
06.01.2011 05:11:24 INFO Worldcheck import: 7,000 (26/sec)
06.01.2011 05:11:57 INFO Worldcheck import: 8,000 (29/sec)
06.01.2011 05:12:33 INFO Worldcheck import: 9,000 (27/sec)
06.01.2011 05:13:00 INFO Worldcheck import: 10,000 (37/sec)
06.01.2011 05:13:09 INFO Worldcheck import: 10,394 (43/sec)

On side of splunk index server we have for the same time:

06.01.2011 05:10:46 INFO Worldcheck import: 6,000 (22/sec)
06.01.2011 05:11:57 INFO Worldcheck import: 8,000 (29/sec)
06.01.2011 05:13:00 INFO Worldcheck import: 10,000 (37/sec)
06.01.2011 05:13:09 INFO Worldcheck import: 10,394 (43/sec)

As you can see in original logfile there are 6 lines output, on splunk index side only 4 lines output got received. Means 30% loss!
Gives me a bit bad taste, because of some 30 million events a day I'm not able to proof every event got recorded.

Does anyone have an idea how to ensure data from lightforwarder are written on splunk index server for sure?

output.conf on lightforwarder looks like:

[tcpout]
disabled = false
defaultGroup = group1_58499

[tcpout:group1_58499]
disabled = false
server = splunk:58499

[tcpout:RouteSplunkLogs]
disabled = true
server = splunk:58499

Kind regards,

Aaron

1 Solution

Paolo_Prigione
Builder

Hi, might this be due to the Splunk indexer lagging behind due to huge volume of logs (and maybe a slow server...), having the lwf fill up its internal queue and then dropping some events?

View solution in original post

0 Karma

Jeremiah
Motivator

It would be helpful to know how many forwarders, what your volumes are per forwarder, and what your indexer specs are? Is this happening for all your forwarders across all of your logs or just particular ones?

0 Karma

ron45
Explorer

Hi, we have a 28 Lightforwarder working towards an index machine. All of them running on huge AIX systems. All logs together producing just around 3 GB per day (ca 30 million events). As wore in the other commend I've now increased the TRUNCATE value in props.conf. If this doesn't help i'll try encreasing the bandwith from 256 kbit to 512 kbit. In my opinion the bandwith shouldn't matter because its aproximatly some 40 MB logs comming from this particular lightforwarder. Up to now i haven't encountered this problem on other systems...

0 Karma

Paolo_Prigione
Builder

Hi, might this be due to the Splunk indexer lagging behind due to huge volume of logs (and maybe a slow server...), having the lwf fill up its internal queue and then dropping some events?

0 Karma

ron45
Explorer

Hi, I solved the issue. In this case the solution was crcSalt=.

0 Karma

ron45
Explorer

Hi, yesterday I've checked the queues on index server side, they are all OK (betw. 0 and 37% max usage). After checking the applikation log entrys on the lightforwareder i did found some huge entry's in the logfile. Each entry (event) has around 2062 bytes and sometime 5 of these events occuring with the same timestamp. This events are around the time where the problem occures. So I did encrease now the TRUNCATE value to 20000 in the props.conf of lightforwarder and indexer, hopefully that helps.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...