What would be the better solution: deploying Universal Forwarders to each server in the environment or collecting logs in a single place first and then sending them to the indexers.
What would be the pros and cons for both solutions? And if there are 10,000 servers on the network, how to check which forwarder is not forwarding data to an indexer? How to find that a single server in 10,000 servers environment not forwarding data?
Generally, if you can swing it, put a forwarder on each server; this balances the load better and limits the liability of server failures.
Forwarders can be not forwarding for many reasons other than shutdown including a crash, which would not have a shutdown event or a network problem. Try this:
| metadata index=* type=hosts | eval latencySeconds=(recentTime-lastTime) | eval quietSeconds=(now()-recentTime) | fieldformat firstTime=strftime(firstTime, "%m/%d/%Y %H:%M:%S") | fieldformat lastTime=strftime(lastTime, "%m/%d/%Y %H:%M:%S") | eval indexTime=strftime(recentTime, "%m/%d/%Y %H:%M:%S")
The field quietSeconds
tells you how long it has been since that forwarder sent any data to any indexer.
Generally, if you can swing it, put a forwarder on each server; this balances the load better and limits the liability of server failures.
Forwarders can be not forwarding for many reasons other than shutdown including a crash, which would not have a shutdown event or a network problem. Try this:
| metadata index=* type=hosts | eval latencySeconds=(recentTime-lastTime) | eval quietSeconds=(now()-recentTime) | fieldformat firstTime=strftime(firstTime, "%m/%d/%Y %H:%M:%S") | fieldformat lastTime=strftime(lastTime, "%m/%d/%Y %H:%M:%S") | eval indexTime=strftime(recentTime, "%m/%d/%Y %H:%M:%S")
The field quietSeconds
tells you how long it has been since that forwarder sent any data to any indexer.