Getting Data In

Why am I getting high CPU and high memory on universal forwarder even though we have very little data coming into Splunk?

Motivator

Hi,

We are using a forwarder (7.1.6) and we are seeing high CPU and high memory for Splunk forwarder (One whole core of a 20 core box).

alt text

However we are only getting in a trickle of data, so it's not like we are getting in millions of log files!

alt text

Is there anything I can do, to see what is happening inside it.

This is a tail of the log

You have new mail in /var/spool/mail/autoengine
dell479srv autoengine /dell479srv2/apps/splunkforwarder_MxOne_Testing_Latest/var/log/
bash$ tail -f splunk/splunkd.log
02-19-2019 15:30:02.144 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:30:02.144 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:35:03.296 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:35:03.296 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:40:02.983 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:40:02.983 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:45:03.007 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:45:03.008 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:50:03.320 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:50:03.320 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
1 Solution

Explorer

Can you please specify the way this log is generated? Looking at the log snippet provided, this looks to be an issue with the way log is updated, since splunk indexes the log file and every new entry in the log should only get indexed. But, in this case, looking at the log snippet, looks like each updated entry is updating/refreshing the whole log file itself, making splunk to consider this as a new file to be indexed again since the crc value has changed:

02-19-2019 15:30:02.144 +0100 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.

View solution in original post

0 Karma

Explorer

Can you please specify the way this log is generated? Looking at the log snippet provided, this looks to be an issue with the way log is updated, since splunk indexes the log file and every new entry in the log should only get indexed. But, in this case, looking at the log snippet, looks like each updated entry is updating/refreshing the whole log file itself, making splunk to consider this as a new file to be indexed again since the crc value has changed:

02-19-2019 15:30:02.144 +0100 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.

View solution in original post

0 Karma

Motivator

Yes this file keeps getting re-written (in parts) , so splunk keeps having to re-read it over and over.

all other files append, this was the one that does not

thanks
rob

0 Karma

Explorer

Can you please specify the way this log is generated? Looking at the log snippet provided, this looks to be an issue with the way log is updated, since splunk indexes the log file and every new entry in the log should only get indexed. But, in this case, looking at the log snippet, looks like each updated entry is updating/refreshing the whole log file itself, making splunk to consider this as a new file to be indexed again since the crc value has changed:

02-19-2019 15:30:02.144 +0100 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.

Motivator

Hi

This was the answer, can you post it and i will accept it please.
The file we deleting parts of it self and Splunk had to keep taking it in again and again.

Rob

0 Karma

Explorer

Posted it 🙂

0 Karma

Communicator

We had a similar issue but we were ingesting over a million files on a uf. The issues was that the UF had to monitor to many files. when we switched to the batch:// input it worked just fine.
I assume you have similar issues because the /net folder is designed to contain nfs shared directories from remote hosts.

0 Karma

Motivator

Hi

I cant use batch mode as other services needed the files after Splunk has read them in (Batch mode will delete the files right?).
We also have a lot of files and this could be causing the issue!!!

The forwarder is installed on machine dell479srv - perhaps we don't need to to use /net/ perhaps this is causing an issue.
[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring.../jmap*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist=.*.log$
sourcetype = jmap
crcSalt =
blacklist=logs_|fixing_|tps-archives

0 Karma

Communicator

Yes, batchmode will delete the files.
In our case the application engineer had to copy the files to a dedicated directory, where we were able to use batch mode.

If possible i wouldn't use the /net and install the UF on each server you want to ingest data.

0 Karma

Motivator

Hi

We removed the /net and it reduced by 30% also we removed some unwanted file we were monitoring as well.
We might have to move to a dedicated machine, this is a bit annoying as people often ask me what is the impact of Splunk and the "Nice answer" is very small, but in this case its hight...hmmm

Cheers for the help
Rob

0 Karma

Communicator

The Splunk UF doesnt use much resources... usually. But when having a lot of files thats not the case.

You're welcome

0 Karma

Communicator

what's your configuration on the UF? like inputs.conf and props.conf if applicable?

0 Karma

Motivator

linputs.conf
[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs.../*.json]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = reset_profiler
crcSalt =
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs.../*.log]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = sun_jvm
crcSalt = <SOURCE>
whitelist = .*gc\.log$|.*gc.*\.log$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT.../*.tps]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = tps
crcSalt = <SOURCE>
whitelist = .*\.tps$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring/vmstat/*.log]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = vmstat-linux
crcSalt = <SOURCE>
whitelist = vmstat.*\.log$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring/nicstat/*]
disabled = false
host = MxOne_Testing_Latest 
index = mlc_live
sourcetype = nicstat
crcSalt = <SOURCE>
whitelist = nicstat.*\.log$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring/*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = mx_version
crcSalt = <SOURCE>
whitelist = mx_version_.*$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring/mlc_version/*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = mlc-version
crcSalt = <SOURCE>
whitelist = mlc_version_.*$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = murex_log4j
whitelist = .*\.log$
crcSalt = <SOURCE>
blacklist=logs_|fixing_|tps-archives|errors.log|.*gc\.log$|.*gc.*\.log$

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT.../*.log]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist=mxtiming.*\.log$
blacklist=logs_|fixing_|tps-archives|mxtiming_crv_nr.*
crcSalt = <SOURCE>
sourcetype = MX_TIMING2

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT.../*service.log]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist = (?<NPID>\d*)-\d*-service\.log
blacklist=logs_|fixing_|tps-archives
crcSalt = <SOURCE>
sourcetype = service

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/*_CheckToolLifeCycle.txt]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = tool_lifecycle
crcSalt = <SOURCE>
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring.../jmap*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist=.*\.log$
sourcetype = jmap
crcSalt = <SOURCE>
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring.../jstack*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist=.*\.log$
sourcetype = jstack
crcSalt = <SOURCE>
blacklist=logs_|fixing_|tps-archives

props.conf
[splunkd]
EXTRACT-fields = (?i)^(?:[^ ]* ){2}(?:[+-]\d+ )?(?P[^ ]*)\s+(?P[^ ]+) - (?P.+)

[splunk_web_service]
EXTRACT-useragent = userAgent=(?P[^ (]+)

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!