Currently we have two heavy forwarder to configured to forward the data to the indexer. Just wanted to know what are the files being captured from both the servers using the below query. We are using Splunk HF version 6.4.0
host =splunk01* sourcetype=splunkd index=_internal "*syslog*"
but I am getting no result found , when checked in the splunkd.log I could see this errors:
08-11-2016 07:06:58.118 -0400 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection_x.x.x.x_8089_splunk01.xxxx.com_splunk01.xxx.com_7xxxx1-XXXXX-XXX-XXX-XXXX
08-11-2016 07:06:58.128 -0400 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection_x.x.x.x_8089_splunk01.xxxx.com_splunk01.xxx.com_7xxxx1-XXXXX-XXX-XXX-XXXX
08-11-2016 07:06:58.156 -0400 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection_x.x.x.x_8089_splunk01.xxxx.com_splunk01.xxx.com_7xxxx1-XXXXX-XXX-XXX-XXXX
08-11-2016 07:07:45.496 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-11-2016 07:07:48.220 -0400 INFO TcpOutputProc - Closing stream for idx=X.X.X.X:9997
08-11-2016 07:07:48.220 -0400 INFO TcpOutputProc - Connected to idx=X.X.X.X:9997
08-11-2016 07:08:17.406 -0400 INFO TcpOutputProc - Closing stream for idx=X.X.X.X:9997
08-11-2016 07:08:17.406 -0400 INFO TcpOutputProc - Connected to idx=X.X.X.X:9997
08-11-2016 07:08:47.566 -0400 INFO TcpOutputProc - Closing stream for idx=X.X.X.X:9997
08-11-2016 07:08:47.566 -0400 INFO TcpOutputProc - Connected to idx=X.X.X.X:9997
08-11-2016 07:08:52.863 -0400 ERROR ExecProcessor - message from "python /opt/splunk/etc/apps/TA-nessus/bin/nessus2splunk.py" usage: nessus2splunk.py [-h] [-s SRCDIR] [-t TGTDIR]
08-11-2016 07:08:52.863 -0400 ERROR ExecProcessor - message from "python /opt/splunk/etc/apps/TA-nessus/bin/nessus2splunk.py" nessus2splunk.py: error: argument -s/--srcdir: Invalid path specified ($SPLUNK_HOME may not be set).
08-11-2016 07:09:17.565 -0400 INFO TcpOutputProc - Closing stream for idx=X.X.X.45:9997
08-11-2016 07:09:17.565 -0400 INFO TcpOutputProc - Connected to idx=X.X.X.X:9997
08-11-2016 07:09:47.859 -0400 INFO TcpOutputProc - Closing stream for idx=X.X.X.X:9997
08-11-2016 07:09:47.958 -0400 INFO TcpOutputProc - Connected to idx=X.X.X.X:9997
08-11-2016 07:10:18.029 -0400 INFO TcpOutputProc - Closing stream for idx=X.X.X.X:9997
08-11-2016 07:10:18.029 -0400 INFO TcpOutputProc - Connected to idx=X.X.X.X:9997
But after restarting the splunk service , I am able to get the output using the above query but it last for few min then again, there will not any data for index =_internal
.
Kindly guide me on this to fix the issue.
It is unlikely that you need to change any system settings. The default number files opened at once by Splunk is 100 files. If you have many more files being monitored you may need to increase this limit in Splunk.
If you've just brought the forwarder online then this may be temporary as Splunk processes all the historical files. So you can consider raising this value depending on your situation.
in limits.conf
[inputproc]
max_fd = <integer>
* Maximum number of file descriptors that Splunk will keep open, to capture any
trailing data from files that are written to very slowly.
* Defaults to 100.
Yes, changing this limit will require a Splunk restart (not system restart).
It is unlikely that you need to change any system settings. The default number files opened at once by Splunk is 100 files. If you have many more files being monitored you may need to increase this limit in Splunk.
If you've just brought the forwarder online then this may be temporary as Splunk processes all the historical files. So you can consider raising this value depending on your situation.
in limits.conf
[inputproc]
max_fd = <integer>
* Maximum number of file descriptors that Splunk will keep open, to capture any
trailing data from files that are written to very slowly.
* Defaults to 100.
Yes, changing this limit will require a Splunk restart (not system restart).
Network connections also count as file descriptors at the Unix OS level so your ulimit's won't exactly line up with the max_fd within the limits.conf file.
During previous support cases Splunk has advised there is no harm in raising the limit, we have the limit set to 1000 on many forwarders without an issue, our OS level limits for the universal forwarder is generally 8192 file descriptors.
8192 is the recommended minimum for an enterprise installation, we apply this setting to universal forwarders. FYI we found turning on SSL drastically increased the OS level file descriptor usage by universal forwarders.
thanks wolverine for putting some effort in this issue, this is the limits.conf detail which is the in the HF server
/opt/splunk/etc/apps/all_indexer_base/local/limits.conf
[search]
max_rawsize_perchunk = 314572800
/opt/splunk/etc/system/default/limits.conf
[inputproc]
file_tracking_db_threshold_mb = 500
learned_sourcetypes_limit = 1000
itorNoHandle.
monitornohandle_max_heap_mb = 0
do you want us to add this stanza ? and what value should be provided in the stanza.
[inputproc]
max_fd =
thanks in advance.
I'm using max_fd = 1000
You can tune based on your environments requirements...
thanks for all the people who guide me on this issue, after changing the limits.conf file and restarted the service issue got fixed.
path = /opt/splunk/etc/apps/yourapp/local/limits.conf
stanza
[inputproc]
max_fd = 1000
If you run ulimit -n (or just run ulimit -a) as the user who you run Splunk as (normally the username Splunk) has the number of files hit an acceptable limit?
If not then login/logout of the server or confirm your limits.conf has been set correctly for the user running Splunk (some of the mentioned settings appear to be for the root user).
For a heavy forwarder, I would consider 8192 or above to be a minimum number of file descriptors on Linux, I have the indexers set much higher.
thanks mmodestino, for guiding us, below are the parameter value set.
System details:
Splunk version 6.4.0 (HF instance)
OS - RedHat 6.6
Memory - 6GB
CPU - 3
VMware
free -m
total used free shared buffers cached
Mem: 15947 8958 6988 0 732 3124
-/+ buffers/cache: 5101 10846
Swap: 3323 52 3271
I have changes the limits.conf value under /etc/security/limits.conf and restart the splunk service but still the changes did reflect, should I restart the servers ?
#* soft core 0
#* hard rss 10000
#@student hard nproc 20
#@faculty soft nproc 20
#@faculty hard nproc 50
#ftp hard nproc 0
#@student - maxlogins 4
root soft nofile 1024000
root hard nofile 1024000
root soft nproc 180000
root hard nproc 180000
ulimit -u
18000
ulimit -n
102400
ulimit -f
unlimited
I have also disabled THP.
cat /sys/kernel/mm/redhat_transparent_hugepage/defrag
always madvise [never]
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
always madvise [never]
But still I could see the same INFO into the splunkd.log . Kindly guide me how to fix this issue.
08-13-2016 01:27:04.217 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:04.354 -0400 INFO TcpOutputProc - Closing stream for idx=x.x.x.x:9997
08-13-2016 01:27:04.354 -0400 INFO TcpOutputProc - Connected to idx=x.x.x.x:9997
08-13-2016 01:27:09.990 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:13.983 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:17.972 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:21.381 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:24.766 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:28.250 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:33.473 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:34.355 -0400 INFO TcpOutputProc - Closing stream for idx=x.x.x.x:9997
08-13-2016 01:27:34.355 -0400 INFO TcpOutputProc - Connected to idx=x.x.x.x:9997
08-13-2016 01:27:38.629 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:42.389 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:46.801 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:51.482 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:54.734 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:27:58.412 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:28:02.596 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:28:04.353 -0400 INFO TcpOutputProc - Closing stream for idx=x.x.x.x:9997
08-13-2016 01:28:04.353 -0400 INFO TcpOutputProc - Connected to idx=x.x.x.x:9997
08-13-2016 01:28:06.651 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:28:10.857 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:28:15.696 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:28:19.354 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:28:23.987 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:28:28.085 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
08-13-2016 01:28:32.075 -0400 INFO TailReader - File descriptor cache is full (100), trimming...
thanks in advance.
Are you running Splunk as root?
Restarting the server is a good idea to ensure you have configured persistent changes. Some OS require init scripts to ensure the changes remain at startup.
Also, you can tail splunkd.log to confirm the changes are seen by Splunk when it starts up.
[ PROTIP : index=_internal source=*splunkd.log ulimit ]
08-15-2016 05:00:53.832 -0400 WARN main - The hard limit of 'processes/threads' is lower than the recommended value. The hard limit is: 1064. The recommended value is: 16000.
08-15-2016 05:00:53.832 -0400 WARN main - The system fd limit (OPEN_MAX) is lower than the recommended value. The system limit (OPEN_MAX) is '10240' The recommended value is '64000'.
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: virtual address space size: unlimited
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: data segment size: unlimited
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: resident memory size: unlimited
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: stack size: 8388608 bytes [hard maximum: 67104768 bytes]
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: core file size: 0 bytes [hard maximum: unlimited]
08-15-2016 05:00:53.832 -0400 WARN ulimit - Core file generation disabled
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: data file size: unlimited
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: open files: 10240 files [hard maximum: unlimited]
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: user processes: 709 processes
08-15-2016 05:00:53.832 -0400 INFO ulimit - Limit: cpu time: unlimited
08-15-2016 05:00:53.833 -0400 INFO loader - Splunkd starting (build debde650d26e).
Hi mmodestino, thanks for putting your effort in this issue.
Yes we are using Root user to run splunk.
We have restarted only the splunk services not the server ? So should we need to restart the server in order to reflect the changes. As its in production environment, need to get some approval before executing init script or restarting the server.
grep ulimit splunkd.log.*
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: virtual address space size: unlimited
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: data segment size: unlimited
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: resident memory size: unlimited
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited]
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: core file size: 0 bytes [hard maximum: unlimited]
splunkd.log.1:08-12-2016 14:39:32.564 -0400 WARN ulimit - Core file generation disabled
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: data file size: unlimited
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: open files: 102400 files
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: user processes: 18000 processes
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Limit: cpu time: unlimited
splunkd.log.1:08-12-2016 14:39:32.564 -0400 INFO ulimit - Linux transparent hugetables support, enabled="never" defrag="never"
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: virtual address space size: unlimited
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: data segment size: unlimited
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: resident memory size: unlimited
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited]
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: core file size: 0 bytes [hard maximum: unlimited]
splunkd.log.2:08-12-2016 08:44:52.564 -0400 WARN ulimit - Core file generation disabled
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: data file size: unlimited
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: open files: 102400 files
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: user processes: 18000 processes
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Limit: cpu time: unlimited
splunkd.log.2:08-12-2016 08:44:52.564 -0400 INFO ulimit - Linux transparent hugetables support, enabled="never" defrag="never"
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: virtual address space size: unlimited
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: data segment size: unlimited
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: resident memory size: unlimited
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited]
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: core file size: 0 bytes [hard maximum: unlimited]
splunkd.log.3:08-11-2016 18:47:10.873 -0400 WARN ulimit - Core file generation disabled
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: data file size: unlimited
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: open files: 4096 files
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: user processes: 63681 processes
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Limit: cpu time: unlimited
splunkd.log.3:08-11-2016 18:47:10.873 -0400 INFO ulimit - Linux transparent hugetables support, enabled="always" defrag="always"
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: virtual address space size: unlimited
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: data segment size: unlimited
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: resident memory size: unlimited
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited]
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: core file size: 0 bytes [hard maximum: unlimited]
splunkd.log.3:08-11-2016 23:22:48.038 -0400 WARN ulimit - Core file generation disabled
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: data file size: unlimited
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: open files: 4096 files
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: user processes: 63681 processes
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Limit: cpu time: unlimited
splunkd.log.3:08-11-2016 23:22:48.038 -0400 INFO ulimit - Linux transparent hugetables support, enabled="never" defrag="never"
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: virtual address space size: unlimited
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: data segment size: unlimited
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: resident memory size: unlimited
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited]
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: core file size: 0 bytes [hard maximum: unlimited]
splunkd.log.4:08-11-2016 13:12:52.508 -0400 WARN ulimit - Core file generation disabled
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: data file size: unlimited
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: open files: 4096 files
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: user processes: 63681 processes
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Limit: cpu time: unlimited
splunkd.log.4:08-11-2016 13:12:52.508 -0400 INFO ulimit - Linux transparent hugetables support, enabled="always" defrag="always"
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: virtual address space size: unlimited
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: data segment size: unlimited
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: resident memory size: unlimited
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited]
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: core file size: 0 bytes [hard maximum: unlimited]
splunkd.log.5:08-10-2016 23:36:48.825 -0400 WARN ulimit - Core file generation disabled
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: data file size: unlimited
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: open files: 4096 files
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: user processes: 63681 processes
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Limit: cpu time: unlimited
splunkd.log.5:08-10-2016 23:36:48.825 -0400 INFO ulimit - Linux transparent hugetables support, enabled="always" defrag="always"
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: virtual address space size: unlimited
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: data segment size: unlimited
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: resident memory size: unlimited
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited]
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: core file size: 0 bytes [hard maximum: unlimited]
splunkd.log.5:08-11-2016 08:41:56.173 -0400 WARN ulimit - Core file generation disabled
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: data file size: unlimited
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: open files: 4096 files
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: user processes: 63681 processes
splunkd.log.5:08-11-2016 08:41:56.173 -0400 INFO ulimit - Limit: cpu time: unlimited
splunkd.log.5:08-11-2016 08:41:56.174 -0400 INFO ulimit - Linux transparent hugetables support, enabled="always" defrag="always"
kindly let me know is there a fix without restarting the server ? thanks in advance.
Hey! Looks like splunk took your changes, no need to restart the server, i was just advising you ensure they persist if it does restart.
If you go to settings > inputs, did you set your number high enough? I quickly ended up needing a cronjob to reap the old files I didn't want splunk to monitor. (one time ingestion)
thanks mmodestino. But I did not understand the below sentence from your comment.
"If you go to settings > inputs, did you set your number high enough? I quickly ended up needing a cronjob to reap the old files I didn't want splunk to monitor. (one time ingestion)"
R u asking about the limits.conf file details ?
this is the cronjob we have placed in the server to delete the large files
00,30 * * * * /bin/find /opt/syslogs/generic -mtime +1 -type f -delete > /dev/null 2>&1
00,30 * * * * /bin/find /opt/syslogs/web_access -mtime +1 -type f -delete > /dev/null 2>&1
so kindly guide us to fix this issue. thanks in advance.
I was eluding to you checking how many files your inputs are monitoring.
if you are monitoring entire directories, for example /home/stats/servers/*
You can check under settings > inputs > files and directories, splunk will show you how many files are being monitored
obviously this depends on what kind of inputs you are using.
also, re-reading your initial question, I wonder if searching _internal is not really what you want.
There is a rest endpoint that you can use to find out what the tailing processor is doing.
read about it here:
http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/
Are you actually missing any data? or maybe we're just down a rabbit hole?
Hey Hemnaath!
You should look at increasing the ulimits on your server as described in the sytem requirements.
See 'Considerations regarding file descriptor limits (FDs) on *nix systems' under supported OSes.
http://docs.splunk.com/Documentation/Splunk/6.4.2/Installation/Systemrequirements
Also may as well ensure you have disabled THP. Another best practice.
The exact change required will differ depending on your system but a quick google should lead you to the answer.
Heres some good talks on these items:
https://answers.splunk.com/answers/13313/how-to-tune-ulimit-on-my-server.html
https://answers.splunk.com/answers/188875/how-do-i-disable-transparent-huge-pages-thp-and-co.html