Getting Data In

How does the forwarder handle large amounts of data?

ddrillic
Ultra Champion

An internal client is asking -

-- How often is the splunk forwarder reading data from the log files? does it ever sleep? if we had a log file fill up in a matter of seconds, could that data be lost?

Tags (4)
0 Karma
1 Solution

somesoni2
Revered Legend

Last answer is based on my experience.

1) Reading data is near real-time. Splunk forwarder has list of files/directories to be monitored and is looking for new events all the time. It doesn't sleep.
2)a. If a log file is filling up fast, but stays with same name location, no data would be lost. Splunk keeps track of till which point it has read the files and continues from there. If there is a spike in data, you would see some latency though, based on maxKBps on forwarder, network bandwidth and load on indexer/intermediate forwarders.
b. If a log file is filling up fast and getting renamed (rolled over) but staying there (one level rename) Splunk would still be reading content from that renamed file. Same limitation would apply as 2.a point.
c. If a log file is filling up fast and getting rolledover more than once (say original log file was myapp.log, it rolled over to myapp.log.1, creating a blank myapp.log file for next events. If myapp.log is filled again, the myapp.log.1 is renamed as myapp.log.2, current myapp.log is renamed as myapp.log.1 and a new myapp.log is created, and so on), and Splunk is not able to read the whole file, there will be data loss.

View solution in original post

0 Karma

somesoni2
Revered Legend

Last answer is based on my experience.

1) Reading data is near real-time. Splunk forwarder has list of files/directories to be monitored and is looking for new events all the time. It doesn't sleep.
2)a. If a log file is filling up fast, but stays with same name location, no data would be lost. Splunk keeps track of till which point it has read the files and continues from there. If there is a spike in data, you would see some latency though, based on maxKBps on forwarder, network bandwidth and load on indexer/intermediate forwarders.
b. If a log file is filling up fast and getting renamed (rolled over) but staying there (one level rename) Splunk would still be reading content from that renamed file. Same limitation would apply as 2.a point.
c. If a log file is filling up fast and getting rolledover more than once (say original log file was myapp.log, it rolled over to myapp.log.1, creating a blank myapp.log file for next events. If myapp.log is filled again, the myapp.log.1 is renamed as myapp.log.2, current myapp.log is renamed as myapp.log.1 and a new myapp.log is created, and so on), and Splunk is not able to read the whole file, there will be data loss.

0 Karma

ddrillic
Ultra Champion

Gorgeous - please convert to an answer.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

In cybersecurity, defenders respond to threats. Architects design the systems that stop them.    As ...

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...