There are two heavy forwarders with F5 load balancer placed behind these servers to manage the load (syslog) and these two servers are used to monitor and forward the syslog (tcp port) information to the indexer clusters.
Currently, the file system size is increasing drastically under this path
/opt/syslogs/generic/*/*.log and we are unable to delete or do log rotate the syslogs, as there are too many subdirectories under this generic folder and each containing millions of data. Due to this, splunkd is failing every few intervals and in splunkd.log could not get the exact error logs to find out why splunkd process keeps on failing frequently.
1) Is this due to the space crunch in the /opt file system where Splunk is configured? I am not sure if this is this causing the problem and in turn makes the splunkd process fail.
2) How to measure the amount of data getting into two heavy forwarder servers from the source="syslogs" everyday or every hour? I have tried to execute the search below, but I am not sure whether it's fetching the correct details. Correct me if it's not the right search.
index =* source ="syslogs" sourcetype ="/opt/syslogs/generic*" | eval indextime=strftime(_indextime, "%Y-%m-%d %H:%M:%S" ) | eval length = len (_raw) /1024 | stats sum(length) count by source indextime index host
3) If the volume of the data getting into the servers are known, then what kind of calculation should be done to measure the disk handling capacity? This way, I can suggest to hardware team to increase the size of the partition or ask them to place a separate server for syslogs.
4) Or, should do any changes in the Splunk configuration files to limit the size of the syslogs getting into the server?
The query looks okay to me.
There's a lot in your question, and I don't have all the answers . . . but I can tell you the amount of work splunkd does to harvest logs depends on both the number of directories and the amount of data in them. So your problem could be just the number of directories under /opt/syslogs/generic//.log.
Thanks mcarney, in both the heavy forwarder server we have huge number of directories and files into it. In heavy forwarder server 1
I could see there are 4200 directories, similarly in heavy forwarder server 2 there are 8703 directories. I have used the below command to get this information from Linux machine.
[root@splunk01 generic]#ls -ltr | wc -l 4200 [root@splunk02 generic]# ls -ltr | wc -l 8703
This is the inputs.conf files present in the both heavy forwarder to read the syslogs details from the sourcetype (syslogs).
[monitor:///opt/syslogs/generic/.../*.log] sourcetype = syslog host_segment = 4 blacklist = xxxx*ltm* index=unix_hosts
With the help of UNIX admin we have executed a cron job for every half an hour but still the file system size is keep on increasing drastically.
Cron Job detail:
00,30 * * * * /bin/find /opt/syslogs/generic -mtime +1 -type f -delete > /dev/null 2>&1
Kindly let us know if we need to change any configuration in Splunk to fix this permanently.
I'm not sure? Splunk support can probably answer that for you. Or there's always the splunk channel.