About strive

strive · ‎12-01-2016

After few restarts of Splunk, now the system has not stopped working after 1350 CLOSE_WAIT connections.. But the CLOSE_WAIT connections just keeps on growing.. In the latest test it shows 5477 CLOSE_WAIT connections after 6 hours of run time.. Why Splunk Management server is not closing the connections.. The next step is to analyze the usage of cherrypy in Splunk...

strive · ‎11-29-2016

Increased ulimit to 64000 Still seeing the issue. The moment close wait connections count reaches 1350+ splunk stops responding to REST API calls.. Splunk Web stops responding.. But if we check Splunk status it says Splunk is running..

strive · ‎11-29-2016

The tcpdump details Pattern when connections are getting closed properly: 127.0.0.1.50764 > 127.0.0.1.8089: Flags [.], cksum 127.0.0.1.50764 > 127.0.0.1.8089: Flags [R.], cksum OR 127.0.0.1.8089 > 127.0.0.1.54338: Flags [F.], cksum 127.0.0.1.54338 > 127.0.0.1.8089: Flags [R.], cksum Pattern when connections are not getting closed properly and connections end up in CLOSE_WAIT 127.0.0.1.50771 > 127.0.0.1.8089: Flags [F.], cksum 127.0.0.1.8089 > 127.0.0.1.50771: Flags [.], cksum

strive · ‎11-29-2016

I completely agree that splunk recommended value for hard fd limit is 64000 What i would like understand is: When splunk will close these CLOSE_WAIT connections? I did not see any default configurations which we can override so that the CLOSE_WAIT connections are closed immediately to free up the resources. When the splunk server starts i see that the message it prints is: WARN main - The hard fd limit is lower than the recommended value. The hard limit is '4096' The recommended value is '64000'. ulimit - Limit: virtual address space size: unlimited ulimit - Limit: data segment size: unlimited ulimit - Limit: resident memory size: unlimited ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited] ulimit - Limit: core file size: unlimited ulimit - Limit: data file size: unlimited ulimit - Limit: open files: 4096 files ulimit - Limit: user processes: 515270 processes ulimit - Limit: cpu time: unlimited ulimit - Linux transparent hugetables support, enabled="always" defrag="always" ---- ---- ---- loader - Detected 16 CPUs, 16 CPU cores, and 64426MB RAM f we go with 33% of 4096, which is approximately 1352. The splunk application responds to all the requests until the total CLOSE_WAIT tcp connections count reaches ~1360. I can increase the ulimit -n provided if there is a guarantee that the CLOSE_WAITS are going to be closed at some point. But thats not happening. It just keeps on increasing. During monitoring i noticed that the first CLOSE_WAIT connection stays in the same state even after 5 hours.

strive · ‎11-26-2016

I completely agree that splunk recommended value for hard fd limit is 64000 What i would like understand is: When splunk will close these CLOSE_WAIT connections? I did not see any default configurations which we can override so that the CLOSE_WAIT connections are closed immediately to free up the resources. When the splunk server starts i see that the message it prints is: WARN main - The hard fd limit is lower than the recommended value. The hard limit is '4096' The recommended value is '64000'. ulimit - Limit: virtual address space size: unlimited ulimit - Limit: data segment size: unlimited ulimit - Limit: resident memory size: unlimited ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited] ulimit - Limit: core file size: unlimited ulimit - Limit: data file size: unlimited ulimit - Limit: open files: 4096 files ulimit - Limit: user processes: 515270 processes ulimit - Limit: cpu time: unlimited ulimit - Linux transparent hugetables support, enabled="always" defrag="always" ---- ---- ---- loader - Detected 16 CPUs, 16 CPU cores, and 64426MB RAM If we go with 33% of 4096, which is approximately 1352. The splunk application responds to all the requests until the total CLOSE_WAIT tcp connections count reaches ~1360. I can increase the ulimit -n provided if there is a guarantee that the CLOSE_WAITS are going to be closed at some point. But thats not happening. It just keeps on increasing. During monitoring i noticed that the first CLOSE_WAIT connection stays in the same state even after 5 hours..

strive · ‎11-25-2016

At some point these CLOSE_WAIT connections will reach the limit that we set by increasing the ulimit -n. increasing ulimit will just give some breathing space. Correct me if i am wrong.

strive · ‎11-25-2016

Hi, We have a Splunk app which exposes a REST end point for other application to request metrics. The main piece of python code inside the method is: service = self.getService() searchjob = service.jobs.create(searchquery) while not searchjob.is_done(): time.sleep(5) reader = results.ResultsReader(searchjob.results(count=0)) response_data = {} response_data["results"] = [] for result in reader: if isinstance(result, dict): response_data["results"].append(result) elif isinstance(result, results.Message): mylogger.info("action=runSearch, search = %s, msg = %s" % (searchquery, results.Message)) search_dict["searchjob"] = searchjob search_dict["searchresults"] = json.dumps(response_data) The dependent application invokes the REST API at some scheduled intervals. There are close to 150 calls that is spread across various time intervals. Note: At any point of time there will be maximum 6 search requests. Normal scenarios Remote application and my Splunk app - both are up and running - everything is fine. For some reason if I have to restart remote application, and after restart - both are up and running - everything is fine. For some reason if I have to restart my Splunk process, and after restart - both the applications are up and running - everything is fine. Problematic scenario: The problem starts when the system where remote application is running is rebooted. After reboot, the remote application will start making calls to the splunk application and in about 60 min, the number of CLOSE_WAIT connections reaches 700+ and eventually splunk system starts throwing socket error. Splunk Web will also become inaccessible. Additional Info: The remote application is a python application written using Tornado framework. The remote application runs inside a docker container that is managed by Kubernetes. The ulimit -n on splunk system shows 1024. (I know that as per Splunk recommendation it is less. But i would like to understand why the issue occurs only during remote systemreboot) During normal times, the searches take on an average 7s to complete. When the remote machine is rebooted during that time the searches take on an average 14s to complete. (Well this may not make sense to relate remote system reboot with splunk search performance on the splunk system. But thats the trend) The CLOSE_WAIT connections are all internal tcp connections tcp 1 0 127.0.0.1:8089 127.0.0.1:37421 CLOSE_WAIT 0 167495826 28720/splunkd tcp 1 0 127.0.0.1:8089 127.0.0.1:32869 CLOSE_WAIT 0 167449474 28720/splunkd tcp 1 0 127.0.0.1:8089 127.0.0.1:37567 CLOSE_WAIT 0 167497280 28720/splunkd tcp 1 0 127.0.0.1:8089 127.0.0.1:33086 CLOSE_WAIT 0 167451533 28720/splunkd Any help or pointers is highly appreciated. Thanks, Strive

strive · ‎10-16-2016

Hi, The Splunk warnings that we see in license master - is it based on license_usage log's type=Usage or type=RollOverSummary? Earlier i had raised a different question on these types - https://answers.splunk.com/answers/397911/what-is-the-difference-between-rolloversummary-and.html The splunk documentation or splunk wiki doesn't talk much about the differences between these types? Could you please help to understand this better. Thanks, Mahesh

strive · ‎10-12-2016

Hi, Our monitor configuration is: [monitor:///opt/diags.log*] disabled = false host = $decideOnStartup sourcetype = diag_snapshot blacklist = \.(gz)$ index = my_index initCrcLength = 1024 ignoreOlderThan = 12h When we deploy this configuration, there is a log file whose modtime is older than 12 hours. Now when I restart Splunk, it checks the modtime and notices that it is older than 12 hours and it doesn't forward the data to indexer. After few hours some lines are written to the log file. The Splunk forwarder is not sending that log file to indexer. According to Splunk documentation - A file whose modtime falls outside this time window when seen for the first time will not be indexed at all. Is there way to make Splunk forwarder forward the newly written log lines to the indexer? For a fact, I know that if I restart Splunk again, the new lines will be indexed. Please note that it is a production system and manually monitoring files and restarting Splunk is not possible. Please let me know if there is any setting that will work along with ignoreOlderThan. Thanks, Strive

strive · ‎07-16-2016

Hi, We have a dashboard with couple of charts. The dashboard is developed using simple XML. We have set colors for these charts. The colors are not retained as-is in the rendered PDF reports. I know for the fact that browser interprets the styling stuff. Is it possible to achieve the same by forcing the inbuilt pdf server to retain the colors? Thanks, Strive

strive · ‎04-29-2016

Hi, I get different results when I run the following searches: index=_internal source=*license_usage.log type="RolloverSummary" earliest=-1d@d latest=-0d@d | bin _time span=1d | stats sum(b) AS volumeB by _time gives me 50 GB (of course in bytes though). index=_internal source=*license_usage.log* type=Usage earliest=-1d@d latest=-0d@d | eval s=if(s=="","unknown",s) | eval h=if(h=="","unknown",h) | bucket _time span=1d | stats sum(b) AS volume_b gives me 90 GB (of course in bytes though) Where did the remaining 40 GB data go? Some points that may help the Splunk users for answering: 1. I restarted Splunk on universal forwarders. After that for next 20 min period, the per_index_thruput in metrics.log shows spikes. 2. When I run the searches on my raw index, for the period when there are spikes in per_index_thruput , I do not see any duplicate log events. So where did the log events go? My daily usage is 50GB only, why does the type=Usage gives me 90GB Thanks, Strive

strive · ‎01-08-2016

Hi, System Details: System Type: Virtual Machine Total CPUs: 8 vCPUs (4 cores. 4 * 2 = 😎 Total Users: 1 The overridden settings are: In limits.conf max_searches_per_cpu = 6 max_searches_perc=75 In authorize.conf srchJobsQuota = 50 My Questions are: As per splunk documentation - max_hist_searches = max_searches_per_cpu x number_of_cpus + base_max_searches so max_hist_searches = 6*8+6 = 54. Since max_searches_perc is set as 75. The maximum searches will be 0.75 * 54 = 40 In authorize.conf the search jobs quota is set as 50. So, the total number of searches that are allowed to run is 40 or 50? Which one takes precedence?

strive · ‎12-15-2015

Are you using Splunk 6.3? How are you setting the path. Could you please add your configurations here..

strive · ‎10-20-2015

Hi, I have scheduled PDF delivery of my reports. I see a hyperlink - View dashboard - appearing in the email body. How do I remove this? I tried all the options in alert_actions.conf file under the [email] stanza. The link appears always. Thanks Strive

strive · ‎10-20-2015

Splunk 6.3 allows you to set your own logo

strive · ‎10-20-2015

Update -- Splunk 6.3 allows you to set your own logo

strive · ‎10-20-2015

Update -- Splunk 6.3 allows you to set your own logo

strive · ‎08-26-2015

The span is specified in the search. It is span=1d That is the minimum span that we can specify for the search macro, the span value is passed as an argument. There are two possible values: 1d and 7d. The problem is when span=1d

strive · ‎08-26-2015

Already span is specified in the search. It is span=1d That is the minimum span that we can specify

strive · ‎08-26-2015

Hi, The timechart in advanced XML creates its own bins of milliseconds. See below Whereas in simple XML, it doesn't. See below How to get this simple XML behavior in advanced XML? Thanks, Strive

strive · ‎04-11-2015

requireClientCert = false , makes it work. As per this link , setting "requireClientCert = true" would require the following conditions to be met : a) "rootCA" must point to a file containing the CA's public key. b) The forwarder's server certificate defined by "sslCertPath" in outputs.conf is signed by that CA. c) The forwarder has the password to read its own certificate ("sslPassword" in outputs.conf). In our case, we were meeting all the conditions but still we faced issues. In the same link, there is point which says -- "The purpose of this requireClientCert=true is to ensure that only forwarders that you have distributed a signed certificate to can connect to this indexer." So, here is my observation: requireClientCert = true should be set, when we are using same signed certificate on the server(receiver) and on all the clients (forwarders)

strive · ‎04-07-2015

Hi, We have a requirement to forward logs from clients (Splunk universal Forwarders) to a server using SSL (tls1.2) First Try: We installed same server certificate on both server and clients (as mentioned in the examples in splunk documentation and in splunk blogs). It worked fine. Change request: Each client should have its own client certificate. Second Try: We created multiple client certificates, one for each client. Installed those certificates on the client. We started getting a error: connection is not established. For second try, we followed the below mentioned steps Step 1: Created a CA Certificate - CACert.pem Step 2: Created a Server Certificate using the above CA Certificate - ServerCert.pem Step 3: Created four client certificates using the above CA Certificate - Client1Cert.pem, Client2Cert.pem, Client3Cert.pem, and Client4Cert.pem Step 4: Installed certificates Step 5: Restarted Splunk on server and clients. ...... Started seeing connection related errors in splunkd.log ............ Client1 outputs.conf file [tcpout] disabled = false defaultGroup = mac [tcpout:mac] server = xxx.xxx.xxx.xxx:YYYY sslRootCAPath = $SPLUNK_HOME/etc/certs/CACert.pem sslCertPath = $SPLUNK_HOME/etc/certs/Client1Cert.pem sslPassword = Client1_privkey_password sslVerifyServerCert = true sslCommonNameToCheck=commonnametest Client2 outputs.conf file [tcpout] disabled = false defaultGroup = mac [tcpout:mac] server = xxx.xxx.xxx.xxx:YYYY sslRootCAPath = $SPLUNK_HOME/etc/certs/CACert.pem sslCertPath = $SPLUNK_HOME/etc/certs/Client2Cert.pem sslPassword = Client2_privkey_password sslVerifyServerCert = true sslCommonNameToCheck=commonnametest similarly for other clients...... Server inputs.conf file [SSL] rootCA = $SPLUNK_HOME/etc/certs/CACert.pem serverCert = $SPLUNK_HOME/etc/certs/ServerCert.pem password = server_privkey_password requireClientCert = true sslVersions = tls1.2 allowSslRenegotiation = true [splunktcp-ssl:YYYY] How will universal forwarder clients validate that the server certificate that is presented is valid? Similarly, how will the server validate that the client certificate that is presented is valid? What is wrong here? Could you please help. Thanks, Strive

strive · ‎01-20-2015

One more http://answers.splunk.com/answers/25431/inconsistency-between-splunk-api-vs-gui-search-results.html

strive · ‎01-20-2015

The REST API has limitation in sending the search results... Refer these links for more details and sample implementation http://answers.splunk.com/answers/52782/100-result-limit-in-js-sdk.html http://dev.splunk.com/view/python-sdk/SP-CAAAEK2 http://dev.splunk.com/view/python-sdk/SP-CAAAER5 http://dev.splunk.com/view/python-sdk/SP-CAAAEE5

strive · ‎10-29-2014

re-added the blacklist configurations, restarted splunk and it is working Enabled the TailingProcessor in DEBUG mode to check.. the splunkd.log clearly shows that the .gz files are ignored. 10-29-2014 12:18:19.380 +0000 DEBUG TailingProcessor - File state notification for path='our_path/file.gz' (first time). 10-29-2014 12:18:19.381 +0000 DEBUG TailingProcessor - Item 'our_path.gz' matches stanza: our_path_*. 10-29-2014 12:18:19.381 +0000 DEBUG TailingProcessor - Not using stanza for this item (Matched blacklist '\.(gz)$'.). 10-29-2014 12:18:19.381 +0000 DEBUG TailingProcessor - Entry is associated with 0 configuration(s). 10-29-2014 12:18:19.381 +0000 DEBUG TailingProcessor - No configurations match, will ignore path='our_path/file.gz'. During checks i came to know that splunk was not restarted when changes were made earlier. Not restarting splunk was the issue 😞

Posts	545
Solutions	74
Karma Given	50
Karma Received	345
Member Since	‎10-06-2012

Online Status	Offline
Date Last Visited	‎06-05-2020 02:03 AM

Python SDK and Rest endpoint /search/jobs are not ...

custom fields are not returned in search/jobs end ...

Why is the SSL connection between forwarder and in...

default.old directories in SHC

Is it necessary to have index definitions in Clust...

How to resolve lots of Splunk internal connections...

Splunk license warnings are based on type=Usage or...

How to monitor newly written log lines from a log ...

Retaining Colors in PDF report

What is the difference between RollOverSummary and...

Re: How to resolve lots of Splunk internal connect...

Re: How to resolve lots of Splunk internal connect...

Re: How to resolve lots of Splunk internal connect...

Re: How to resolve lots of Splunk internal connect...

Re: How to resolve lots of Splunk internal connect...

Re: How to resolve lots of Splunk internal connect...

How to resolve lots of Splunk internal connections...

Splunk license warnings are based on type=Usage or...

How to monitor newly written log lines from a log ...

Retaining Colors in PDF report

What is the difference between RollOverSummary and...

Total number of searches in limits.conf and author...

Re: Replace default Splunk Logo with my own logo o...

How to remove the View Dashboard link from PDF rep...

Re: How to replace the Splunk logo with our compan...

Re: How to replace the Splunk logo with our compan...

Re: Replace default Splunk Logo with my own logo o...

Re: How to get an Advanced XML timechart to not au...

Re: How to get an Advanced XML timechart to not au...

How to get an Advanced XML timechart to not automa...

Re: How to configure SSL certificates to securely ...

How to configure SSL certificates to securely forw...

Re: Is there a truncation limit when running a sea...

Re: Is there a truncation limit when running a sea...

Re: How to avoid duplicate log files from universa...