About rbal_splunk

rbal_splunk · ‎11-29-2022

The mongod.log, it is failed to recover because of OplogStartMissing, which is a known issue https://jira.mongodb.org/browse/SERVER-40954 Error: 2022-11-29T05:01:57.080Z I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last op time fetched: { ts: Timestamp(1669697961, 2), t: 79 }. source's GTE: { ts: Timestamp(1669698089, 2), t: 80 } hashes: (6527934590833943207/-6009016642415496648) 2022-11-29T05:01:57.102Z F ROLLBACK [rsBackgroundSync] RecoverToStableTimestamp failed. :: caused by :: UnrecoverableRollbackError: No stable timestamp available to recover to. You must downgrade the binary version to v3.6 to allow rollback to finish. You may upgrade to v4.0 again after the rollback completes. Initial data timestamp: Timestamp(1669697961, 2), Stable timestamp: Timestamp(0, 0) To resolve the issue # splunk stop # splunk clean kvstore --local # splunk start Once the KVStore is up, it was on 4.0 . Manually upgraded kvstore to 4.2 as per "Upgrade KV store server to version 4.2" and documentation https://docs.splunk.com/Documentation/Splunk/9.0.2/Admin/MigrateKVstore

rbal_splunk · ‎11-29-2022

The env was on 8.2.7. the environment has 3 Node Search Head Cluster. Nodes upgraded from version 8.2.7 to 9.0.2. Post upgrade for one SHC member the kvstore status was DOWN.

rbal_splunk · ‎07-25-2022

you would be controlling this with `authorize.conf` srchTimeWin srchTimeEarliest and WLM rules

rbal_splunk · ‎07-25-2022

Is there any controls to limit the size of a user search? The use case is Splunk Cloud and limiting a search, if it downloads for example more than 10TB from SmartyStore to the cache.

rbal_splunk · ‎05-20-2021

The problem about RAID5/6 even with SSDs, and especially with SmartStore, is that you add at least two dimensions of access patterns and obviously a lot more linear write (download from SmartStore to the local cache) to the game. A normal IDX does random read/write for ingesting and a lot more read while searching. The upload of a rolled bucket will need another linear read of the bucket and a linear write if you download the bucket again. So even more IO and remember that write IO will stress the RAID because you have to calc the checksums… I think this was learning from our own tests with RAID in AWS. Also, why waste space/iops on your cache, if you already have a copy in S3 (smart store for stable bucket) or on other hosts (RF for buckets that haven’t been uploaded)

rbal_splunk · ‎05-20-2021

Why avoid RAID5 on SSD when using SmartStore?

rbal_splunk · ‎11-12-2020

The view is based on search index="pci_posture_summary" search_name="PCI - Compliance Status History - Summary Gen" | `makemv(orig_tag)` | `mvappend_field(tag,orig_tag)` | extract kv_for_pci_compliance_status_history_summary | timechart span=`pci_compliance_history_span` latest(All) as All If you look at the SPL for the base search for "PCI - Compliance Status History - Summary Gen", it has following results Each of the requirement refers to scorecards on "PCI Compliance Posture" Based on the search for "Compliance Status History" - Where “All” requirement has rolled up number from another score cards on - The logic is, when we have new notable i.e ( where investigation has not started ) , in this case we will show compliance_status= - 10000000000 -In case we have notable that are being investigated they will have compliance_status=0 -If all the investigation get closed -when the search run in that case compliance_status= 10000000000

rbal_splunk · ‎11-12-2020

The issue is for the “PCI Compliance Posture” dashboard the View “Compliance Status History” is not showing data. It just displays. It just displayed line

rbal_splunk · ‎11-12-2020

This isn't an issue. We ship with references to tag=filtered, but we don't explicitly filter anything out of the box.For, now you can replace panel's search query in pci_posture.xml with below search query and the customer can see events in the panel: index="pci_posture_summary" search_name="PCI - Compliance Status History - Summary Gen" | `makemv(orig_tag)` | `mvappend_field(tag,orig_tag)` | extract kv_for_pci_compliance_status_history_summary | timechart span=`pci_compliance_history_span` latest(All) as All What we have done here is we have removed the filtering condition from the search query.

rbal_splunk · ‎11-12-2020

VERSION=8.0.6 ES version= version = 6.1.0 Splunk_DA-ESS_PCICompliance=4.1.0 Issue is for the “PCI Compliance Posture” dashboard the View “Compliance Status History” is not showing data. It just displays "Unable to find tag filtered"

rbal_splunk · ‎10-22-2020

Cache manager evict buckets when (i) the total disk utilized by WARM and Cold Buckets exceeds max_cache_size or (ii)The current free space for partition falls below eviction_padding+minFreeSpace max_cache_size Specifies the maximum space, in megabytes, per partition, that the cache can occupy on disk. If this value is exceeded, the cache manager starts evicting buckets. If max_cache_size=0 it means this feature is not used, and has no maximum size, in this case, the eviction will happen when the $SPLUNK_DB partition's free space drops below eviction_padding+minFreeSpace. The total cache usage is calculated by the Cache manager as the sum of all NONE hot buckets size. When $SPLUNK_HOME and $SPLUNK_DB are on different partitions, assuming all of the caches in $SPLUNK_DB will account for disk spaces in that partitions only. DEBUG on CacheManager shows entries stats for cache manager 07-08-2020 19:19:58.806 +0000 DEBUG CacheManager - The system has freebytes=944143511552 with minfreebytes=471859200000 cachereserve=471859208192 totalpadding=943718408192 buckets_size=0 maxSize=0 07-08-2020 19:19:58.887 +0000 DEBUG CacheManager - The system has freebytes=944141152256 with minfreebytes=471859200000 cachereserve=471859208192 totalpadding=943718408192 buckets_size=0 maxSize=0 Where Freebytes>> freeBytes Minfreebytes >>minFreeBytes Cachereserve >> evictionReservedBytes Totalpadding >> minFreeBytes + evictionReservedBytes buckets_size >> max_cache_size

rbal_splunk · ‎10-22-2020

Could you please help understand the DEBUG option for CacheManager to instigate eviction?

rbal_splunk · ‎10-22-2020

Here are the steps that could work for you. Enable smartstore on indexes Ensure all buckets have been replicated to smartstore ( migration complete ) Place indexer into offline mode Delete ALL buckets from coldpath Re-point coldpath to hot_warm or homePath volume. Remove cold store EBS Restart splunkd Run splunk boot strap coommand on the Cluster Master Run “/opt/splunk/bin/splunk _internal call /services/cluster/master/control/control/init_recreate_index -method POST” on cluster master

rbal_splunk · ‎10-22-2020

We would like to remove EBS volumes which were used for cold store and DM summary Docs is not overly clear on the recommended approach https://docs.splunk.com/Documentation/Splunk/7.3.4/Indexer/MigratetoSmartStore.

rbal_splunk · ‎10-08-2020

Here are the steps that can be used to check the Report acceleration and the corresponding bucket upload to the remote store. OR else you could also use the REST endpoint |rest /servicesNS/-/-/admin/summarization splunk_server=local | table summary.hash,summary.id,summary.is_inprogress,summary.size,summary.time_range, summary.complete,saved_searches.admin;search;* Normalized Summary Id=NS6f37597da0cade4c” as would match with the name highlighted below in the bucket path. $SPLUNK_HOME/bin/splunk cmd splunkd rfs -- ls --starts-with volume:my_s3_vol | grep -i '/ra/' 4220,_internal/ra/0c/a8/26~949FE8DD-2419-4F07-A151-77B02413A437/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/guidSplunk-949FE8DD-2419-4F07-A151-77B02413A437/metadata 75,_internal/ra/0c/a8/26~949FE8DD-2419-4F07-A151-77B02413A437/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/guidSplunk-949FE8DD-2419-4F07-A151-77B02413A437/metadata_c 71680,_internal/ra/0c/a8/26~949FE8DD-2419-4F07-A151-77B02413A437/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/guidSplunk-949FE8DD-2419-4F07-A151-77B02413A437/ra_data 799,_internal/ra/0c/a8/26~949FE8DD-2419-4F07-A151-77B02413A437/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/receipt.json 4220,_internal/ra/37/73/28~949FE8DD-2419-4F07-A151-77B02413A437/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/guidSplunk-949FE8DD-2419-4F07-A151-77B02413A437/metadata 75,_internal/ra/37/73/28~949FE8DD-2419-4F07-A151-77B02413A437/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/guidSplunk-949FE8DD-2419-4F07-A151-77B02413A437/metadata_c 71680,_internal/ra/37/73/28~949FE8DD-2419-4F07-A151-77B02413A437/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/guidSplunk-949FE8DD-2419-4F07-A151-77B02413A437/ra_data 799,_internal/ra/37/73/28~949FE8DD-2419-4F07-A151-77B02413A437/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/receipt.json 6294,_internal/ra/3c/fd/27~DA6E5901-FAF9-4AC1-855C-8C5E53A87B23/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS6f37597da0cade4c/guidSplunk-949FE8DD-2419-4F07-A151-77B02413A437/metadata

rbal_splunk · ‎10-08-2020

After Smartstore was enabled for deployment the indexer's log's are flooded with messages like "INFO CacheManagerHandler - cache_id="ra|tto_uswest2_tomcatfrontend~39~4345D76C-80D6-4BC7-991F-EA835C2B892C|08281223-D92B-4A36-BCA0-83970376D322_tto_search_agupta13_NS2480590abee10f99" not found cache_id = ra|tto_uswest2_tomcatfrontend~39~4345D76C-80D6-4BC7-991F-EA835C2B892C|" What the best way to find teh bucket corresponding to report acceleration.

rbal_splunk · ‎08-12-2020

current we have some issue in the calculation of the cache_size so recommendation won't be to set max_cache_size=0 (to ignore it) To manage the cache size use https://docs.splunk.com/Documentation/Splunk/8.0.5/Admin/Serverconf eviction_padding = <positive integer> * Specifies the additional space, in megabytes, beyond 'minFreeSpace' that the cache manager uses as the threshold to start evicting data. * If free space on a partition falls below ('minFreeSpace' + 'eviction_padding'), then the cache manager tries to evict data from remote storage enabled indexes. * Default: 5120 (~5GB) set this value to disk space you would like to keep empty.

rbal_splunk · ‎07-20-2020

For indexer cluster, the summary is created on the peer node that is primary for the associated bucket or buckets. The peer then uploads the summary to remote storage. When a peer needs the summary, its cache manager fetches the summary from remote storage. Summary replication between peers is not needed and the uploaded summary is available to all peer nodes. Here is an example from my env that shows Report Acceleration bucket on remote store. [root@centos65-64sup02 rbal]#$SPLUNK_HOME/bin/splunk cmd splunkd rfs -- ls --starts-with index:main | grep -v '/db/' #for full paths run: splunkd rfs -- ls --starts-with volume:my_s3_vol/main/ size,name 1080,main/ra/38/01/16~3D41EF74-A16D-421D-9FD7-83B3849101B2/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS16c348adc086860d/guidSplunk-3D41EF74-A16D-421D-9FD7-83B3849101B2/metadata.csv 75,main/ra/38/01/16~3D41EF74-A16D-421D-9FD7-83B3849101B2/3F3F537C-7DAD-4CF8-B062-168D17BC15C7_search_admin_NS16c348adc086860d/guidSplunk-3D41EF74-A16D-421D-9FD7-83B3849101B2/metadata_checksum NOTE: In this example, the "guidSplunk-3D41EF74-A16D-421D-9FD7" is the GUID of the search head where data is accelerated. so in my case based on the report Accelerated search and time range only the relevant buckets were accelerated. Splunk has an open bug where SPL-186425:S2: Rebuilding an evicted DMA summary causes us to re-upload the old tsidx file with the newly rebuilt one. This means that we would see upload/download of the bucket when buckets are being accelerated. As per this JIRA: When rebuilding an evicted DMA summary, for some reason we localize the remote copy first and in parallel, we begin to rebuild the summary on disk. For the graph posted the upload activity was due to report acceleration 03A227A6-442C-4EC2-96BA-EDB3AEBCB2DF_XXXX_commerce_products_emmett_NSab5a2628876cea87 03A227A6-442C-4EC2-96BA-EDB3AEBCB2DF_XXXX_partnerships_jamesw_NS9c8a6f5149bf222c To get the name of the corresponding Splunk report uses the REST endpoint on the search head. | rest servicesNS/-/-/admin/summarization |table saved_searches.admin;search;test_support_ra.name,summary.hash,summary.earliest_time,summary.complete,summary.id, summary.complete,summary.id

rbal_splunk · ‎07-20-2020

there has been a huge spike in the number of uploads, resulting in many more failed uploads from throttling than we had before. It is currently unclear to me what caused this. Whether constant retries are underlying the huge spike, or some new data being uploaded have caused this. The bucket size has remained pretty constant, but the number of daily uploads has gone from about 80k to 4 million. Looking at some of s3 access logs, it seems like search objects are getting uploaded? Most of these uploads are for "ra" (Report Acceleration bucket) index=_internal host=<XXX> sourcetype=splunkd action=upload status=succeeded NOT cacheId=ra* | rex field=cacheId "bid\|(?<indexname>\w+)\~\w+\~" | timechart span=1m partial=f limit=50 per_second(kb) as kbps by indexname index=_internal host=<XXX> sourcetype=splunkd action=upload status=succeeded NOT cacheId=ra* | rex field=cacheId "bid\|(?<indexname>\w+)\~\w+\~" | timechart span=1m partial=f limit=50 per_second(kb) as kbps by indexname

rbal_splunk · ‎06-30-2020

The best way to manage would be to enable s3 bucket versioning and s3 access logs. Monitor for Splunk buckets with more than one version in s3. if data integrity exists to detect alterations to splunk bucket data files, then s3 object versioning is a great way to detect alterations. So, for smart store enabled indexes, integrity control is offloaded to the object storage. Typical implementations of version control and object logging can be utilized to have similar functionality of data integrity control.

rbal_splunk · ‎06-30-2020

This question has come up a few times, how does Splunk handle data integrity in large ES implementation. On Splunk docs, it states 'Data integrity control feature. SmartStore-enabled indexes are not compatible with the data integrity control feature, described in Manage data integrity in the Securing Splunk Enterprise manual. As covered in https://docs.splunk.com/Documentation/Splunk/8.0.4/Indexer/AboutSmartStore

rbal_splunk · ‎06-29-2020

The computation for max_cache_size is broken across 7.2.6 to 8.0.4. As a workaround best will be enforced cache limit max_cache_size=0 and eviction_padding=<CONFIGURE_AS_PER_DESIRED_LIMIT> For detail on JIRA contact Splunk Support

rbal_splunk · ‎06-29-2020

Cluster indexer across the site is configured with Smartstore. Each indexer has 6TB partition that is utilized by $SPLUNK_HOME+$SPLUNK_DB The Cache Manager is configured as below $SPLUNK_HOME/etc/system/default/server.conf [diskUsage] $SPLUNK_HOME/etc/system/default/server.conf minFreeSpace = 5000 $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf [cachemanager] $SPLUNK_HOME/etc/system/default/server.conf evict_on_stable = false $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf eviction_padding = 5120 $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf eviction_policy = lru $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf hotlist_bloom_filter_recency_hours = 720 $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf hotlist_recency_secs = 604800 $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf max_cache_size = 4096000 $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf max_concurrent_downloads = 8 $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf max_concurrent_uploads = 8 $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf remote.s3.multipart_max_connections = 4 $SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf remote.s3.multipart_upload.part_size = 536870912 The indexer is showing partition 6TB partition 97% utilized , although it should not have crossed 4TB based on max_cache_size = 4096000 Filesystem 1K-blocks Used Available Use% Mounted ondevtmpfs 71967028 0 71967028 0% /devtmpfs 71990600 0 71990600 0% /dev/shmtmpfs 71990600 4219944 67770656 6% /runtmpfs 71990600 0 71990600 0% /sys/fs/cgroup/dev/nvme0n1p2 20959212 6812056 14147156 33% /none 71990600 0 71990600 0% /run/shm/dev/nvme1n1 6391527336 5864488560 204899848 97% /opt/splunktmpfs 14398120 0 14398120 0% /run/user/1003 Here is Debug entry for CacheManger {06-10-2020 19:32:42.604 +0000 DEBUG CacheManager - The system has freebytes=210838605824 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000 06-10-2020 19:32:42.607 +0000 DEBUG CacheManager - The system has freebytes=210838536192 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000 06-10-2020 19:32:46.502 +0000 DEBUG CacheManager - The system has freebytes=210850021376 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000 06-10-2020 19:32:46.505 +0000 DEBUG CacheManager - The system has freebytes=210850172928 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000 06-10-2020 19:33:06.727 +0000 DEBUG CacheManager - The system has freebytes=210255511552 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000 Note From DEBUG observation : freebytes= 210072649728 minfreebytes= 5242880000 cachereserve= 5368709120 totalpadding= 10611589120 buckets_size= 3069785296896 <<<<<< 3TB As calculated by cacahemanager maxSize= 4294967296000 <<<<<< configured 4TB limit The issue is cache has almost utilized 6TB of disk space but as per the calculation it shows usage of 3TB. Due to this miscalculation Splunk is not evicting the buckets.

rbal_splunk · ‎06-26-2020

- The coldpath is needed during migration when pre-existing data is migrated to SmartStore. - As discussed in our documentation “Cold buckets can, in fact, exist in a SmartStore-enabled index, but only under limited circumstances. Specifically, if you migrate an index from non-SmartStore to SmartStore, any migrated cold buckets use the existing cold path as their cache location, post-migration. In all respects, cold buckets are functionally equivalent to warm buckets. The cache manager manages the migrated cold buckets in the same way that it manages warm buckets. The only difference is that the cold buckets will be fetched into the cold path location, rather than the home path location” coldPath and homePath can point to the same volume, but different directories like. homePath = volume:hot/$_index_name/db coldPath = volume:hot/$_index_name/colddb So in your case, if you have already migrated to Smart store, so now you can point the coldPath to use the volume same as homepath.

rbal_splunk · ‎06-26-2020

We migrated almost all of our existing indexes from traditional indexes with separate warm and cold mount paths to smartstore a little under a year ago. It's all worked great, however for indexes with long term retention, buckets that were in the coldPath at the time of smartstore converstion continue to be stubbed out and localized from S3 back into the coldPath, while everything since conversion uses the warm path, as expected since that mount is the SPLUNK_DB definition used by the smartstore indexes. I want to re-map the SPLUNK_COLD path to use the same OS mount, but what is the supported way to do that with smartstore? From the documentation (https://docs.splunk.com/Documentation/Splunk/7.3.3/Indexer/Moveanindex) it sounds like you would normally manually copy the data from the old to the new path, and then re-map the variable, however with smart store does it work the same? Or is it just something like force clearing the smartstore cache on the OS mount I want to clear off, re-mapping the variable, and then new localization of buckets simple uses the re-mapped path?

Posts	472
Solutions	86
Karma Given	156
Karma Received	775
Member Since	‎05-01-2013

Online Status	Offline
Date Last Visited	‎12-18-2024 02:36 PM

Post upgrade of 3 Node Search Head Cluster from ve...

Is there any controls to limit the size of a user ...

Not to use RAID5 on SSD when using SmartStore.

[PCI] Could you please elobrate logic for display ...

[PCI]Compliance Status History Scorecard In PCI Co...

[Smartstore] CacheManager and eviction

[Smartstore] Can we change homepath or coldpath o...

[SmartStore] How to map report acceleration report...

[SmartStore] How is the Replication of Summary buc...

[smartstore] splunk smartstore and Data integrity

Re: Post upgrade of 3 Node Search Head Cluster fro...

Post upgrade of 3 Node Search Head Cluster from ve...

Re: Is there any controls to limit the size of a u...

Is there any controls to limit the size of a user ...

Re: Not to use RAID5 on SSD when using SmartStor...

Not to use RAID5 on SSD when using SmartStore.

Re: [PCI] Could you please elobrate logic for disp...

[PCI] Could you please elobrate logic for display ...

Re: [PCI]Compliance Status History Scorecard In PC...

[PCI]Compliance Status History Scorecard In PCI Co...

Re: [Smartstore] CacheManager and eviction

[Smartstore] CacheManager and eviction

Re: [Smartstore] Can we change homepath or coldpat...

[Smartstore] Can we change homepath or coldpath o...

Re: [SmartStore] How to map report acceleration re...

[SmartStore] How to map report acceleration report...

Re: SmartStore cache manager not enforcing limit ...

Re: [SmartStore] How is the Replication of Summary...

[SmartStore] How is the Replication of Summary buc...

Re: [smartstore] splunk smartstore and Data integr...

[smartstore] splunk smartstore and Data integrity

Re: SmartStore cache manager not enforcing limit ...

SmartStore cache manager not enforcing limit enfo...

Re: re-mapping cold path variable with smartstore

re-mapping cold path variable with smartstore