About dmrzzz

dmrzzz · ‎09-13-2019

I just solved my own variation of this problem... Key discovery: empirically, at least in v6.5.0, the helpful behavior of time_before_close applies only to the tailing processor and not to the batch reader. (see also https://answers.splunk.com/answers/109779/when-is-the-batchreader-used-and-when-is-the-tailingprocessor-used.html) During peak periods, one particular log file of mine grows so quickly that my UF would decide to read it using the batch reader, and each hour I saw hundreds of log messages from that file split into two Splunk events each, along with contemporaneous "INFO TailReader - Batch input finished reading file" messages in splunkd.log. I don't want rsyslog working extra hard to write individual syslog messages to the filesystem atomically, so my solution was to stop using the batch reader: [default] # Set very high so we never use the batch reader. min_batch_size_bytes = 1000000000000 [inputproc] # Set higher than the number of simultaneously active log files to # avoid splitting events. max_fd = 10000

dmrzzz · ‎07-23-2019

Thanks, that's a big help!

dmrzzz · ‎07-11-2019

It fell back to the classic implementation of the Monitor Input. Thanks for this explanation; that makes sense, and answers my questions #2 and #3. It looks like this bug (failing to use inotify on a symlink) was fixed in 7.2.0 (bug: SPL-153035). Interesting, but that actually hurts my use case. To summarize and perhaps clarify my original results: inotify on same directory twice = bad outcome (~5min data latency) fall back to classic = good outcome (new data propagates promptly, at least on a lightly-loaded test box) Using UF v6.5.0, the logs2 symlink case produces the good outcome specifically because it can't use inotify (AFAICT). I just retested that case with 7.2.0, and it now produces the bad outcome just like my other three test cases (more consistent, but less useful). So I still need an answer to my question #1 (also the post title): why does monitoring the same directory twice with inotify work so poorly (i.e. incur such high data latency)? Also: is there some way in the configuration to ask it to use the classic implementation instead?

dmrzzz · ‎07-02-2019

Background: I have syslog collector systems which run UFs to put received syslog data into Splunk. We're in the process of migrating from an old Splunk infrastructure to a new Splunk infrastructure, redoing all our indexes and sourcetypes as we go, and I want a gradual transition period rather than a hard flag day. Objective: have UF simultaneously populate the same log files into two different Splunks (via two separate tcpout groups) using different index/sourcetype values for each. The winning inputs.conf that seems to fit my needs is: [monitor:///services/net-logs/logs/*/(udp|tcp)_(1514|1515)/] index = networktest sourcetype = syslog host_segment = 6 blacklist = \.gz$ crcSalt = <SOURCE> _TCP_ROUTING = splunk_legacy [monitor:///services/net-logs/logs2/*/(udp|tcp)_(1514|1515)/] index = some_new_index sourcetype = some_new_sourcetype host_segment = 6 blacklist = \.gz$ crcSalt = <SOURCE> _TCP_ROUTING = aws_ufwd_tier_splunk_cert with a symlink from /services/net-logs/logs2 -> /services/net-logs/logs . But I tried several other variations on path pairs before that (everything else equivalent) which did not seem to work nearly as well. With this pair [monitor:///services/net-logs/logs/*/(udp|tcp)_(1514|1515)/] [monitor:///./services/net-logs/logs/*/(udp|tcp)_(1514|1515)/] the new monitor stanza (with dot) processed new data promptly as expected; however, the original (without dot) would take quite a long time -- sometimes approaching 5 minutes -- to notice when a new file appeared or when new lines appeared at the end of an existing file. I observed this in several different ways: splunk list inputstatus (note 411 vs 137) shortly after sending new messages splunk list inputstatus | grep -A6 tcp_1514/wirelessprv-10-192-167-133.near.illinois.edu/10.192.167.133/user-2019-06-27-16 /./services/net-logs/logs/whipsaw-dev-aws1.techservices.illinois.edu/tcp_1514/wirelessprv-10-192-167-133.near.illinois.edu/10.192.167.133/user-2019-06-27-16 file position = 411 file size = 411 parent = /./services/net-logs/logs/*/(udp|tcp)_(1514|1515)/ percent = 100.00 type = finished reading -- /services/net-logs/logs/whipsaw-dev-aws1.techservices.illinois.edu/tcp_1514/wirelessprv-10-192-167-133.near.illinois.edu/10.192.167.133/user-2019-06-27-16 file position = 137 file size = 137 parent = /services/net-logs/logs/*/(udp|tcp)_(1514|1515)/ percent = 100.00 type = finished reading DEBUG logging (note 3+ minute gap between the first notification detected on each path) 06-27-2019 17:46:06.118 +0000 DEBUG TailingProcessor - File state notification for path='/./services/net-logs/logs/whipsaw-dev-aws1.techservices.illinois.edu/tcp_1514/wirelessprv-10-192-167-133.near.illinois.edu/10.192.167.133/user-2019-06-27-17' (first time). 06-27-2019 17:49:21.235 +0000 DEBUG TailingProcessor - File state notification for path='/services/net-logs/logs/whipsaw-dev-aws1.techservices.illinois.edu/tcp_1514/wirelessprv-10-192-167-133.near.illinois.edu/10.192.167.133/user-2019-06-27-17' (first time). querying legacy Splunk with | eval latency_secs=(_indextime-_time) In all cases the delayed monitor did eventually notice the change and forward the data. This pair [monitor:///services/net-logs/logs/*/(udp|tcp)_(1514|1515)/] [monitor:///services/net-logs/./logs/*/(udp|tcp)_(1514|1515)/] likewise resulted in high data latency for the original path without dot. This pair [monitor:///services/net-logs/logs/*/(udp|tcp)_(1514|1515)/] [monitor:///services/net-logs2/logs/*/(udp|tcp)_(1514|1515)/] (with net-logs2 symlinked to net-logs) resulted in high data latency for net-logs2. Hypothesis: the path that experiences the latency is whichever one comes later in an alpha-sort. Which brings me, finally, to my questions: Why in those other three cases does one monitor of the pair notice changes immediately while the other one takes several minutes? Note that it's definitely not due to load; all this was done on a test server which was receiving no outside log data except for the handful of test messages I fed it. Why does my "winning" configuration with the symlink immediately before the wildcard not experience the same latency problem as all the others? In those tests, splunk list inputstatus consistently shows the same file position for both paths, new log files/lines produce simultaneous File state notification DEBUG messages for both paths, and the actual latency in Splunk is a satisfying 0-2s. Can I expect my logs2 solution to perform consistently under heavier production load, or are my positive test results just a fluke? I suspect the difference has something to do with: 07-01-2019 23:29:10.811 +0000 INFO TailingProcessor - Adding watch on path: /services/net-logs/logs. 07-01-2019 23:29:10.811 +0000 INFO TailingProcessor - Adding watch on path: /services/net-logs/logs2. 07-01-2019 23:29:10.879 +0000 ERROR FilesystemChangeWatcher - Error setting up inotify on "/services/net-logs/logs2": Not a directory Wild speculation: in the other cases, splunkd would set up inotify to watch the same logs directory twice (via two different pathnames, but I don't think that matters to inotify), and consequently splunkd either didn't receive all the notifications it was expecting or didn't understand how to interpret them as matching the distinct paths. Whereas with the logs2 symlink, it knew it couldn't use inotify, so it did... something else... instead? I haven't found a shred of official Splunk documentation that explains how the monitor detects when files appear and change. It seems clear that there much be at least two such mechanisms, but what are they?

Posts	4
Solutions	0
Karma Given	2
Karma Received	1
Member Since	‎07-02-2019

Online Status	Offline
Date Last Visited	‎06-05-2020 02:04 AM

Why does monitoring the same directory twice with ...

Re: Why does my Splunk forwarder transmit incomple...

Re: Why does monitoring the same directory twice w...

Re: Why does monitoring the same directory twice w...

Why does monitoring the same directory twice with ...