Hi,
I deployed Splunk distributed topology. Now my server Search Head has issue: KVStore is on failed state (it make app "Enterperise Security" failed too).
I checked "/opt/splunk/var/log/splunk/splunkd.log" and found the below logs:
==========================================
10-13-2021 18:14:03.127 +0700 ERROR DataModelObject - Failed to parse baseSearch. err=Error in 'inputlookup' command: External command based lookup 'correlationsearches_lookup' is not available because KV Store initialization has failed. Contact your system administrator., object=Correlation_Search_Lookups, baseSearch=| inputlookup append=T correlationsearches_lookup | eval source=_key | eval lookup="correlationsearches_lookup" | append [| `notable_owners`] | fillnull value="notable_owners_lookup" lookup | append [| `reviewstatuses`] | fillnull value="reviewstatuses_lookup" lookup | append [| `security_domains`] | fillnull value="security_domain_lookup" lookup | append [| `urgency`] | fillnull value="urgency_lookup" lookup
10-13-2021 18:14:30.350 +0700 ERROR KVStorageProvider - An error occurred during the last operation ('replSetGetStatus', domain: '15', code: '13053'): No suitable servers found (`serverSelectionTryOnce` set): [connection closed calling ismaster on '127.0.0.1:8191']
10-13-2021 18:14:30.350 +0700 ERROR KVStoreAdminHandler - An error occurred.
===============================
Could anyone help me to troubleshoot this issue to solve it?
Thanks so much!
I don't believe that could be the issue. However you can verify by disabling some of those Threat Intelligence items you saw in those screenshots.
Disable an intelligence source to stop downloading information from the source. This also prevents new threat indicators from the disabled source from being added to the threat intelligence collections.
From the Enterprise Security menu bar, select Configure > Data Enrichment > Threat Intelligence Management. Find the intelligence source. Under Status, click Disable.
I did see in the logs which sounds interesting. "ogFile::synchronousAppend failed with 8192 bytes unwritten out of 8192 bytes; b=0x5612eed8a000 No space left on device"
How does your strorage look? Is anything running out? You can try to resync your KV store. Take a look at this article https://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/ResyncKVstore
I use "df -hT" and "df -hiP" commands to check SH server and see everything is normal, but i think inode is full when KvStore still run normal before it's failed. I 'll try to only restart Splunk in SH server and will see what happend then tell you later, thanks!
Definitely let me know what happens.
As i said, KvStore-MongoDB made overload inode and it's failed. After i restart Splunk, MongoDB and KvStore work properly now, but i have to take care the inode indication from now to sure KvStore won't be failed again (i have increased disk capacity to increased disk inode from 16M to 97M).
Anyway, thanks for your helps!
Do you think the reason has relation with update domain process of feature "Threat intelligent" of app "Enterprise Security" ?
Hi Stefanie,
Restart Splunk in SH is the last solution, i want to resolve issue perpectly.
I found the issue has relation with feature "Threat intelligent" of app "Enterprise Security" as attached images.
Could you check the images to investigate more detailed?
Many thanks!
In /opt/splunk/etc/auth there should be a server.pem file. Rename it to server.old and restart Splunk Enterprise. That should resolve your issue, if not, check mongod.log. 🙂
I see these messages in mongod.log:
2021-10-12T09:14:58.012Z W FTDC [ftdc] Uncaught exception in 'FileNotOpen: Failed to open interim file /opt/splunk/var/lib/splunk/kvstore/mongo/diagnostic.data/metrics.interim.temp' in full-time diagnostic data capture subsystem. Shutting down the full-time diagnostic data capture subsystem.
2021-10-12T09:15:06.842Z I CONTROL [journal writer] LogFile::synchronousAppend failed with 8192 bytes unwritten out of 8192 bytes; b=0x5612eed8a000 No space left on device
2021-10-12T09:15:06.842Z F - [journal writer] Fatal Assertion 13515 at src/mongo/db/storage/mmap_v1/logfile.cpp 250
2021-10-12T09:15:06.842Z F - [journal writer]
***aborting after fassert() failure
2021-10-12T09:15:06.859Z F - [journal writer] Got signal: 6 (Aborted).
I found a message that could be the reason to make KvStore failed as below:
Configuration file settings may be duplicated in multiple apps: stanza="[App-stanza-name] ALERT: Login firewall failure" conf_type="savedsearches" apps="app-name,app-name"
Do you think the reason has relation with update domain process of feature "Threat intelligent" of app "Enterprise Security" ?
I don't believe that could be the issue. However you can verify by disabling some of those Threat Intelligence items you saw in those screenshots.
Disable an intelligence source to stop downloading information from the source. This also prevents new threat indicators from the disabled source from being added to the threat intelligence collections.
From the Enterprise Security menu bar, select Configure > Data Enrichment > Threat Intelligence Management. Find the intelligence source. Under Status, click Disable.
I did see in the logs which sounds interesting. "ogFile::synchronousAppend failed with 8192 bytes unwritten out of 8192 bytes; b=0x5612eed8a000 No space left on device"
How does your strorage look? Is anything running out? You can try to resync your KV store. Take a look at this article https://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/ResyncKVstore