Getting Data In

Multiple file monitors on Windows machine- Some work only when others are disabled?

mburgess97
Path Finder

I'm having difficulty ingesting log data from flat files into Splunk. I'm monitoring six different directories, each containing 100-1000 log files, some of which are historical and will require less ingestion in the future. However, I'm seeing inconsistent results and not all logs are being ingested properly.

Here's an example of the issue: When all six monitors are enabled, I don't see any data from [file-monitor5] or [file-monitor6]. If I disable 1-3, I start seeing logs from [file-monitor5], but not [file-monitor6]. I have to disable 1-5 to get logs from [file-monitor6]. The configuration for each monitor is shown below:

[file-monitor1]
[file-monitor2]
[file-monitor3]
[file-monitor4]
[file-monitor5]
[file-monitor6]

I'm wondering if Splunk doesn't monitor all inputs at the same time or if it ingests monitored files based on timestamp, getting the earliest file in each folder.  Here's my current config for the monitors:

[file-monitor1://C:\example]
whitelist=.log$|.LOG$
sourcetype=ex-type
queue=parsingQueue
index=test
disabled=false

Can anyone provide insight into what might be causing the inconsistent results and what I can do to improve the ingestion process?

Labels (3)
0 Karma
1 Solution

tscroggins
Influencer

A common change I make is increasing limits.conf [default] min_batch_size_bytes to a value somewhat greater than the largest value of maxFileSize in log log.cfg or log-local.cfg. maxFileSize shouldn't be larger than 25MB base 10 in a default configuration, so a min_batch_size_bytes value of e.g. 25165824 (25 MiB base 2) is a good starting point.

Increasing min_batch_size_bytes to a value larger than maxFileSize (allowing for variance) will prevent backlogged internal logs from being processed by BatchReader and single-threading the pipeline.

If you need to monitor/tail many long-lived files with sizes less than min_batch_size_bytes, you can increase limits.conf [inputproc] max_fd from 100 to a larger value.

As @PickleRick noted, you can modify limits.conf [thruput] maxKBps to increase maximum throughput (or set it to 0 to remove the soft limit).

If you have enough local resources, you can also increase server.conf [general] parallelIngestionPipelines from 1to 2 or a higher value, but if this becomes necessary on an endpoint, a deeper review of your logging implementation may be warranted.

I wouldn't recommend changing parallelIngestionPipelines absent a clear requirement. Changes to min_batch_size_bytes, maxKBps, and if needed, max_fd, are usually enough to keep data flowing.

View solution in original post

0 Karma

tscroggins
Influencer

@mburgess97

Hi,

Assuming the monitor stanzas take the correct form:

[monitor://C:\example]
whitelist=.log$|.LOG$
sourcetype=ex-type
# let Splunk manage queues unless you have a specific reason to override
# default behavior
#queue=parsingQueue
index=test
disabled=false

this is often a product of how Splunk merges monitor stanzas internally.

Given the following inputs:

[monitor://C:\example]
whitelist = (?i)\.log$

[monitor://C:\example]
whitelist = (?i)\.txt$

[monitor://C:\example]
whitelist = (?i)\.json$

view their combination using `splunk cmd btool inputs list' (for .conf file validation) or `splunk show config inputs' (for live configuration):

[monitor://C:\example]
_rcvbuf=1572864
evt_resolve_ad_obj=0
host=$decideOnStartup
index=default
whitelist=(?i)\.json$

and their expansion using `splunk list inputstatus':

"C:\Program Files\SplunkUniversalForwarder\bin\splunk.exe" list inputstatus monitor
C:\example
type = directory

C:\example\bar.txt
parent = C:\example
type = File did not match whitelist '(?i)\.json$'.

C:\example\baz.json
file position = 75
file size = 75
parent = C:\example
percent = 100.00
type = finished reading

C:\example\foo.log
parent = C:\example
type = File did not match whitelist '(?i)\.json$'.

From the command output, you can see the "json" whitelist definition took precedence over the "log" and "txt" whitelist definitions.

When using wildcards in paths, Splunk will merge monitor stanzas in unexpected ways. For example, on my test system, the following stanzas:

[monitor://C:\example\*\foo]
whitelist = (?i)\.log$

[monitor://C:\example\*\bar]
whitelist = (?i)\.txt$

[monitor://C:\example\*\baz]
whitelist = (?i)\.json$

expand to:

C:\example\*\bar
type = directory

C:\example\*\baz
type = directory

C:\example\*\foo
type = directory

C:\example\dir
parent = C:\example\*\foo
type = directory

C:\example\dir\bar
parent = C:\example\*\bar
type = directory

C:\example\dir\bar\bar.txt
parent = C:\example\*\foo
type = File did not match whitelist '^C\:\\example\\[^\\]*\\bar$'.

C:\example\dir\baz
parent = C:\example\*\baz
type = directory

C:\example\dir\baz\baz.json
parent = C:\example\*\foo
type = File did not match whitelist '^C\:\\example\\[^\\]*\\bar$'.

C:\example\dir\foo
parent = C:\example\*\foo
type = directory

C:\example\dir\foo\foo.log
parent = C:\example\*\foo
type = File did not match whitelist '^C\:\\example\\[^\\]*\\bar$'.

and nothing is read!

The solution is to combine similar input stanzas into one if complex pattern matching is required:

[monitor://C:\example\dir]
whitelist = (?i)C:\\example\\dir\\(?:foo\\[^.]*.log|bar\\[^.]*.txt|baz\\[^.]*.json)$
C:\example\dir
type = directory

C:\example\dir\bar
parent = C:\example\dir
type = directory

C:\example\dir\bar\bar.txt
file position = 44
file size = 44
parent = C:\example\dir
percent = 100.00
type = finished reading

C:\example\dir\baz
parent = C:\example\dir
type = directory

C:\example\dir\baz\baz.json
file position = 75
file size = 75
parent = C:\example\dir
percent = 100.00
type = finished reading

C:\example\dir\foo
parent = C:\example\dir
type = directory

C:\example\dir\foo\foo.log
file position = 44
file size = 44
parent = C:\example\dir
percent = 100.00
type = finished reading

Alternatively, expand the wildcards as needed in separate input stanzas to specify index, sourcetype, etc. without props or transforms:

[monitor://C:\example\dir\foo\*.log]
sourcetype = foo
index = foo

[monitor://C:\example\dir\bar\*.txt]
sourcetype = bar
index = bar

[monitor://C:\example\dir\baz\*.json]
sourcetype = baz
index = baz

C:\example\dir\bar\*.txt
type = directory
C:\example\dir\bar\bar.txt
file position = 44
file size = 44
parent = C:\example\dir\bar\*.txt
percent = 100.00
type = finished reading

C:\example\dir\baz\*.json
type = directory

C:\example\dir\baz\baz.json
file position = 75
file size = 75
parent = C:\example\dir\baz\*.json
percent = 100.00
type = finished reading

C:\example\dir\foo\*.log
type = directory

C:\example\dir\foo\foo.log
file position = 44
file size = 44
parent = C:\example\dir\foo\*.log
percent = 100.00
type = finished reading

 

mburgess97
Path Finder

Thank you for the breakdown of how all of this works.

PickleRick
SplunkTrust
SplunkTrust

Since you don't provide too much details, the answer can only be relatively generic.

There are two possible issues with monitor inputs.

1) The config issue - if you have some overlapping settings, some of them may be overwriting other ones.

2) The performance issue - you can only open so many files until you run out of descriptors.

0 Karma

mburgess97
Path Finder

 

Here are the specifics.

As of right now.  When this specific monitor is on [monitor://C:\Program Files\Microsoft\Exchange Server\V15\Logging\Ews] --- it's the only sourcetype that is reporting back to the indexers.


#[monitor://C:\inetpub\logs\LogFiles\W3SVC1\*.log]
#sourcetype=MSWindows:2012:IIS
#queue=parsingQueue
#index=exchange
#disabled=false


[monitor://C:\Program Files\Microsoft\Exchange Server\V15\Logging\Ews]
whitelist=\.log$|\.LOG$
sourcetype=MSWindows:2013EWS:IIS
queue=parsingQueue
index=exchange
disabled=false
initCrcLength=8192


[monitor://C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\Logs\MessageTracking]
whitelist=\.log$|\.LOG$
time_before_close = 0
sourcetype=MSExchange:2013:MessageTracking
queue=parsingQueue
index=exchange
#initCrcLength=8192
#persistentQueueSize=15MB
disabled=false
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ok. I'll assume that the fact that the inputs are listed twice is due to some mistake with copy-pasting and that you copied it from DS but the app is properly deistributed to UF(s).

I'd say that at first glance the definitions look mostly OK (no need to set queue parameter).

Three basic tools to help with diagnosing such cases (run them all on the UF of course):

splunk btool inputs list monitor --debug
splunk list monitor
splunk list inputstatus

You'll se the effective config on your UF (and which files it comes from) and what the UF is actually monitoring and what is the status of each file.

0 Karma

mburgess97
Path Finder

Thank you for your response.

I'm not sure why the queue parameter was set as it was the default for the TA. I have been using the troubleshooting tools you listed, but the issue appears to have resolved itself after some tinkering.

It turns out that a couple of the monitors had a large number of older files that needed to be ingested first before the newer ones could be picked up. I'm not sure if this is by design in Splunk.

Basically, here's how the files were ingested:

[file-monitor1]

file1.log (date 2/2/23) <-- ingested first
file2.log (date 2/22/23) <-- ingested second
file3.log (date 3/15/23) <-- ingested fifth
[file-monitor2]

file1.log (date 3/14/23) <-- ingested third
file2.log (date 3/16/23) <-- ingested sixth
file3.log (date 3/18/23) <-- ingested last


I assumed that Splunk would read the monitors in tandem, but it seems that the order in which the files were ingested mattered.

Once again, thank you for your help.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yes, the files are read one by one until forwarder catches up with them. So if you have many of big files it might take a while. What's annoying is that forwarder's internal logs are ingested the same way and are sometimes queued at the end.

And check your thruput limits if you're not choking yourself too much.

0 Karma

tscroggins
Influencer

A common change I make is increasing limits.conf [default] min_batch_size_bytes to a value somewhat greater than the largest value of maxFileSize in log log.cfg or log-local.cfg. maxFileSize shouldn't be larger than 25MB base 10 in a default configuration, so a min_batch_size_bytes value of e.g. 25165824 (25 MiB base 2) is a good starting point.

Increasing min_batch_size_bytes to a value larger than maxFileSize (allowing for variance) will prevent backlogged internal logs from being processed by BatchReader and single-threading the pipeline.

If you need to monitor/tail many long-lived files with sizes less than min_batch_size_bytes, you can increase limits.conf [inputproc] max_fd from 100 to a larger value.

As @PickleRick noted, you can modify limits.conf [thruput] maxKBps to increase maximum throughput (or set it to 0 to remove the soft limit).

If you have enough local resources, you can also increase server.conf [general] parallelIngestionPipelines from 1to 2 or a higher value, but if this becomes necessary on an endpoint, a deeper review of your logging implementation may be warranted.

I wouldn't recommend changing parallelIngestionPipelines absent a clear requirement. Changes to min_batch_size_bytes, maxKBps, and if needed, max_fd, are usually enough to keep data flowing.

0 Karma

mburgess97
Path Finder

Actually - I forgot to mention that I doubled the maxKBps from default to 512

[thruput]
maxKBps = 512

I was also considering increasing min_batch_size_bytes, but it appears the default is already 25MB.

Also, the documentation states this is global and cannot be configured per input.  I assume global as in - per forwarder - but not per input stanza.

 

min_batch_size_bytes = <integer>
* Specifies the size, in bytes, of the file/tar after which the
  file is handled by the batch reader instead of the trailing processor.
* Global parameter, cannot be configured per input.
* NOTE: Configuring this to a very small value could lead to backing up of jobs
  at the tailing processor.
* Default: 20971520

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, just because you doubled it doesn't mean that it's high enough 😉

But seriously - there is no one "right" value for maxKBps. It depends on how much data you're getting on your UF and how much data your downstream servers (indexers, HFs) can accept. I have some UFs which have 256 kBps thruput limit, but have some that have 8MBps (I don't like setting it to unlimited).

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...