Splunk Dev

Splunk Cloud collection replication

marathon-man
Explorer

I maintain an app on Splunk, the AbuseIPDB App. This app uses a collection that holds a set of key-value pairs for things like user state and settings, and it's looked up on every command (i.e. abuseipdbcheck ip="127.0.0.1").

We had been receiving bug reports about a KeyError that seemed to have been fixed by setting replicate=true for the collection. I suppose that because the app's configuration collection was not being replicated, distributed searches failed (since the configuration collection was not being found on the individual search peers?, hence the KeyError).

However, I've just received another report, with the same issue, this time from a Splunk Cloud Victoria setup. The collection does have replicate=true. Can anyone give some guidance on this?

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Hi
It's obviously that without replication=true it's only in SH side and indexers cannot use is.
Can you tell more about that later error report?
r. Ismo
0 Karma

marathon-man
Explorer

Yes, the error is:

.../apps/abuseipdb_app/bin/splunklib/client.py", line 1384: UrlEncoded('abuseipdb_control_coll')

Which is actually a Python KeyError for the UrlEncoded name of the "control_coll" holding user data (in essence, "collection not found"). The search query causing this is:

index=fortianalyzer action="accept"

| dedup srcip

| eval is_internal=`isInternalIPv4(srcip)`

| search is_internal=False

| abuseipdbcheck ip=srcip

| where abuseConfidenceScore=100

| table srcip, abuseConfidenceScore

 

Interestingly, directly running the abuseipdbcheck on the search head works:

|makeresults

|abuseipdbcheck ip=127.0.0.1

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Based on your example and those in vendor’s documentation, your issue is that you are using only streaming commands before abuseipdbcheck and also it is streaming command. This are meaning that all processing are done in indexers (unlike makeresults and vendor’s examples which are running on SH side). In your case your search try to use kvstore lookup on indexers, but unless you have mark it for replication to indexer (default is false), it cannot find it and fail. 

Now you have two options:

  1. define that collection will replicate to indexers
  2. force execution to SH side using non streaming command or use e.g. localop command https://docs.splunk.com/Documentation/Splunk/9.4.1/SearchReference/Localop which force execution to SH

 

0 Karma

marathon-man
Explorer

I see. I'd really like option one to work here, and it did work for a user that was using Splunk on-prem:

[abuseipdb_control_coll]
enforceTypes = true
field._key = string
field.value = string
replicate = true

 

But this same configuration was apparently not working for a user on Splunk Cloud Victoria, whatever the reason (see the previous example). I was unsure whether it was a problem with eventual consistency, or what exactly.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Is your command actively querying the collection? If so, replicate=true won't help you.

 

replicate=true will push the collection's content to the indexers *as CSV*. The KV Store on the indexers (if even running) won't know the collection or its content.

I'm surprised and doubtful that replicate=true ever worked for someone running Splunk Enterprise on-prem.

 

Easiest fix would be to only use the command on the SHs, e.g. after a |stats, |tstats, etc. - or if need be |localop.

marathon-man
Explorer

Is your command actively querying the collection? If so, replicate=true won't help you.

 

replicate=true will push the collection's content to the indexers *as CSV*. The KV Store on the indexers (if even running) won't know the collection or its content.


Ah, that explains a lot!

I'm surprised and doubtful that replicate=true ever worked for someone running Splunk Enterprise on-prem.


They claimed it did, but I never tested it myself.

Easiest fix would be to only use the command on the SHs, e.g. after a |stats, |tstats, etc. - or if need be |localop.


I guess setting local = true for the command (actually all the commands I guess) would do the trick here, as mentioned in another reply?

martin_mueller
SplunkTrust
SplunkTrust

@marathon-man wrote:


I guess setting local = true for the command (actually all the commands I guess) would do the trick here, as mentioned in another reply?


Yes, but: If someone runs, say, index=foo | mylocalcommand then Splunk will pull all events to the SH, including all _raw and fields - that's a lot of work. Therefore I'd include the recommendations (e.g. "only run after stats") in the docs together with setting local=true.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I suppose the command should be defined as centralized streaming command instead of distributed one - the local setting in commands.conf - see https://docs.splunk.com/Documentation/Splunk/latest/Admin/Commandsconf

 

marathon-man
Explorer

I suppose the command should be defined as centralized streaming command instead of distributed one - the local setting in commands.conf - see https://docs.splunk.com/Documentation/Splunk/latest/Admin/Commandsconf


Thanks, I'll be giving this a try. I guess I can turn replicate off for the collections while I'm at it.

0 Karma

iamsahilshaiks
Splunk Employee
Splunk Employee
 
Thanks,
Shaik Sahil

Splunk Core Certified Consultant
0 Karma

PickleRick
SplunkTrust
SplunkTrust

@iamsahilshaiksKVStore  contents are not replicated between tiers. Actually, kvstore does not normally run on indexer tier at all and even if for some reason it is (you fancy runing a modular input which uses kvstore on an indexer instead of on a separate HF) it is not replicated anywhere. If a collection is replicated to the indexer tier, its contents are getting exported as CSV to the knowledge bundle (so there are possible size/performance issues with it).

isoutamo
SplunkTrust
SplunkTrust
There couple of parameters in limits.conf and distsearch.conf which define how this is done and what are limitations for those replicated CSV files.
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Here https://splunkbase.splunk.com/app/6368 is one cool splunk app to use both SCP and onprem. Basically it’s btool with some additions, I strongly recommended it. 

Get Updates on the Splunk Community!

Best Strategies to Optimize Observability Costs

 Join us on Tuesday, May 6, 2025, at 11 AM PDT / 2 PM EDT for an insightful session on optimizing ...

Fueling your curiosity with new Splunk ILT and eLearning courses

At Splunk Education, we’re driven by curiosity—both ours and yours! That’s why we’re committed to delivering ...

Splunk AI Assistant for SPL 1.1.0 | Now Personalized to Your Environment for Greater ...

Splunk AI Assistant for SPL has transformed how users interact with Splunk, making it easier than ever to ...