About drrushi_splunk

drrushi_splunk · ‎08-07-2014

First check the peer's splunkd.log for any messages during the same time as the search-head's DistributedBundleReplicationManager error. If you do find in the peer's splunkd.log messages such as: ERROR DistBundleRestHandler - File users/xxx/yyy/local/props.conf in knowledge bundle is either not in white list or else excluded by black list. Bundle /opt/splunk/var/run/searchpeers/ will be removed ...then this means that there must be on the peer a rouge 'distsearch.conf' which does't not explicitly whitelist or blacklist any bundle files ... as a result by default the peer simply rejects the bundle. To workaround this please remove any distsearch.conf (from system/local OR etc/apps/appname/local) on the peers and restart Splunk. In version 6.1 a new functionality was added to the peer which allows peers to blacklist/whitelist bundle contents based on locally defined rules (via local distsearch.conf).

drrushi_splunk · ‎09-23-2013

The culprit for this condition in our case was the leftover pooling.ini.lock file under /etc/pooling/. Essentially when a pool member (search-head) validates itself with the pool a check is performed against pooling.ini and a lock is created by the requesting search-head. The 'lock' is comprised of two files under: /etc/pooling/ 1. pooling.ini.lock 2. pooling.ini. On occasion search-heads fail to clean up after themselves and remove their respective lock and leaving them behind in the location above. Another search-head attempting to execute a search cannot validate itself against the pooling.ini due to the existence of the other lock and by default attempts/waits for 10s at which point it proceeds with accessing the pooling.ini regardless. To confirm whether you are hitting this issue please check: 1) In splunkd.log or btool.log (it is unclear why this message appears in both places or 1 out of 2) there will be messages as the following: ERROR SearchHeadPoolInfo - Error reading search head pool info: Failed to lock //****/pool/etc/pooling/pooling.ini with code 1, possible reason: No such file or directory 2) The job inspector output for any search job in version 5+ will include in Execution Costs a measurement: startup.handoff = 10000 Note: In pre-v5 Splunk the issue may be there but the startup.handoff is not calculated, therefore it may be harder to verify if you have hit the condition. Also "total run time" in Job Inspector does NOT include the startup.handoff time. This value is somewhere in the 10s value matching the current per current design timeout. In version 5.0.6 the behavior will change (SPL-66563) where Splunk will optimistically attempt to open the pooling.ini first and then fall back on a file-based lock around pooling.ini 3) Another simple check would be to execute ANY search and measure time against the "wall" clock. If it takes around 10s before you see anything on the UI, AND the above two checks are positive then you have hit this behavior. If you have hit this condition the workaround is simply removing 1. pooling.ini.lock 2. pooling.ini.

hexx · ‎07-16-2013

At this time, the scheduled search maintaining the "sos_servers_cache" asset lookup that the Topology view consumes will add any newly-found search peers but will not remove those that no longer respond. This is a limitation of the current implementation that we plan to improve on in a future release of S.o.S, where we will probably still show the non-responding peers but mark them as such ("missing" or "unresponsive"). In order to get rid of decommissioned search peers, you need to edit the $SPLUNK_HOME/etc/apps/sos/lookups/sos_servers_cache.csv lookup table and manually remove their entries. We also hope to offer a UI-driven method to do this in a future release.

drrushi_splunk · ‎10-02-2013

Is that the complete Search Inspector output? It seems incomplete. What version of Splunk is this?

drrushi_splunk · ‎08-12-2014

It has been observed in other cases that a possible antivirus scan may be holding the checkpoint file at the same time that Splunk is attempting to rename it. Please stop the antivirus and retry. Splunk 5.0.9 and 6.0 has new improvements targeting this particular scenarios, where the rename attempt will be retried once again at a later time.

drrushi_splunk · ‎01-31-2012

[project_number] INDEX = True INDEXED_VALUE = False this should be sufficient: [project_number] INDEXED = True

Posts	14
Solutions	4
Karma Given	4
Karma Received	28
Member Since	‎04-08-2011

Online Status	Offline
Date Last Visited	‎06-05-2020 02:02 AM

Bundle replication fails with: response_code=204

Searches are delayed by 10s before any results are...

S.o.S: Topology view continues to list a disabled ...

Re: Bundle replication fails with: response_code=2...

Re: Searches are delayed by 10s before any results...

Re: S.o.S: Topology view continues to list a disab...

Re: Search job inspector discrepancies

Re: Windows Universal Forwarder - Missing Event Lo...

Re: Extracting field from source for indexing

Join the Conversation