Solved: loosing events from light forwarder

ron45 · ‎01-06-2011

Hi there

We use a configuration of splunk light forwarders and a splunk index server. The light forwarders using tcp to write data to the index server. Now we've encountered, some lines of logfiles are missing on index side. Splunk logfiles (on light forwarder and index server) don't show any errors.

An example of lost data is:
Original Logfile:

06.01.2011 05:10:46 INFO Worldcheck import: 6,000 (22/sec)
06.01.2011 05:11:24 INFO Worldcheck import: 7,000 (26/sec)
06.01.2011 05:11:57 INFO Worldcheck import: 8,000 (29/sec)
06.01.2011 05:12:33 INFO Worldcheck import: 9,000 (27/sec)
06.01.2011 05:13:00 INFO Worldcheck import: 10,000 (37/sec)
06.01.2011 05:13:09 INFO Worldcheck import: 10,394 (43/sec)

On side of splunk index server we have for the same time:

06.01.2011 05:10:46 INFO Worldcheck import: 6,000 (22/sec)
06.01.2011 05:11:57 INFO Worldcheck import: 8,000 (29/sec)
06.01.2011 05:13:00 INFO Worldcheck import: 10,000 (37/sec)
06.01.2011 05:13:09 INFO Worldcheck import: 10,394 (43/sec)

As you can see in original logfile there are 6 lines output, on splunk index side only 4 lines output got received. Means 30% loss!
Gives me a bit bad taste, because of some 30 million events a day I'm not able to proof every event got recorded.

Does anyone have an idea how to ensure data from lightforwarder are written on splunk index server for sure?

output.conf on lightforwarder looks like:

[tcpout]
disabled = false
defaultGroup = group1_58499

[tcpout:group1_58499]
disabled = false
server = splunk:58499

[tcpout:RouteSplunkLogs]
disabled = true
server = splunk:58499

Kind regards,

Aaron

Paolo_Prigione · ‎01-06-2011

Hi, might this be due to the Splunk indexer lagging behind due to huge volume of logs (and maybe a slow server...), having the lwf fill up its internal queue and then dropping some events?

View solution in original post

Jeremiah · ‎01-07-2011

It would be helpful to know how many forwarders, what your volumes are per forwarder, and what your indexer specs are? Is this happening for all your forwarders across all of your logs or just particular ones?

ron45 · ‎01-07-2011

Hi, we have a 28 Lightforwarder working towards an index machine. All of them running on huge AIX systems. All logs together producing just around 3 GB per day (ca 30 million events). As wore in the other commend I've now increased the TRUNCATE value in props.conf. If this doesn't help i'll try encreasing the bandwith from 256 kbit to 512 kbit. In my opinion the bandwith shouldn't matter because its aproximatly some 40 MB logs comming from this particular lightforwarder. Up to now i haven't encountered this problem on other systems...

Paolo_Prigione · ‎01-06-2011

Hi, might this be due to the Splunk indexer lagging behind due to huge volume of logs (and maybe a slow server...), having the lwf fill up its internal queue and then dropping some events?

ron45 · ‎01-11-2011

Hi, I solved the issue. In this case the solution was crcSalt=.

ron45 · ‎01-07-2011

Hi, yesterday I've checked the queues on index server side, they are all OK (betw. 0 and 37% max usage). After checking the applikation log entrys on the lightforwareder i did found some huge entry's in the logfile. Each entry (event) has around 2062 bytes and sometime 5 of these events occuring with the same timestamp. This events are around the time where the problem occures. So I did encrease now the TRUNCATE value to 20000 in the props.conf of lightforwarder and indexer, hopefully that helps.

loosing events from light forwarder

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM