We need to monitor 300 devices for up and down state and the customer would like to have a tight SLA such as 3 - 4 minutes reporting on a down device.
I have the following scripted input working -
ping.sh -
date; echo ip=<ip1> ; ping -c 4 <ip1> ;
date; echo ip=<ip2> ; ping -c 4 <ip2> ;
date; echo ip=<ip3> ; ping -c 4 <ip3> ;
date; echo ip=<ip4> ; ping -c 4 <ip4> ;
... for 300 lines
Is this a right approach?
This script as is, is probably taking over 10 minutes to run. Should I spawn all 300 lines in the background? Is it reasonable to spawn 300 commands in parallel?
I would wrap a script around a tool like nmap that's fit for purpose and manages concurrency internally. E.g.:
$ nmap -v -sn -iL path/to/input.list -oG -
# Nmap 6.40 scan initiated Sat Apr 3 12:05:45 2021 as: nmap -v -sn -iL /tmp/targets.txt -oG -
# Ports scanned: TCP(0;) UDP(0;) SCTP(0;) PROTOCOLS(0;)
Host: 192.168.1.1 (foo.example.com) Status: Up
Host: 192.168.1.2 () Status: Down
Host: 192.168.1.3 () Status: Down
Host: 192.168.1.4 () Status: Down
Host: 192.168.1.5 () Status: Down
Host: 192.168.1.6 () Status: Down
Host: 192.168.1.7 () Status: Down
Host: 192.168.1.8 () Status: Down
Host: 192.168.1.9 () Status: Down
Host: 192.168.1.10 () Status: Down
# Nmap done at Sat Apr 3 12:05:47 2021 -- 10 IP addresses (1 host up) scanned in 1.41 seconds
The output is easily parsed:
# inputs.conf
[script://./bin/foo_scan.sh]
index = main
interval = 123
sourcetype = foo_scan
# props.conf
[foo_scan]
SHOULD_LINEMERGE = false
DATETIME_CONFIG = CURRENT
TRANSFORMS-ignore_foo_scan_comments = ignore_foo_scan_comments
# transforms.conf
[ignore_foo_scan_comments]
REGEX = ^#
DEST_KEY = queue
FORMAT = nullQueue
Alternatively, you can remove the transform, index the comment lines, and retain the additional metadata they provide. The overall scan time is useful.
Change the input interval to something sane like the expected (average) runtime of the scan or use a cron schedule instead.
You can modify nmap arguments or filter script output as needed to tune behavior. Perhaps you only want to output devices that are down, for example.
If you prefer, you can write the output to files and create a second input for the files themselves. Your wrapper script should include log rotation functionality.