Splunk Cloud collection replication

marathon-man

I maintain an app on Splunk, the AbuseIPDB App. This app uses a collection that holds a set of key-value pairs for things like user state and settings, and it's looked up on every command (i.e. abuseipdbcheck ip="127.0.0.1").

We had been receiving bug reports about a KeyError that seemed to have been fixed by setting replicate=true for the collection. I suppose that because the app's configuration collection was not being replicated, distributed searches failed (since the configuration collection was not being found on the individual search peers?, hence the KeyError).

However, I've just received another report, with the same issue, this time from a Splunk Cloud Victoria setup. The collection does have replicate=true. Can anyone give some guidance on this?

isoutamo

Hi
It's obviously that without replication=true it's only in SH side and indexers cannot use is.
Can you tell more about that later error report?
r. Ismo

marathon-man

Yes, the error is:

.../apps/abuseipdb_app/bin/splunklib/client.py", line 1384: UrlEncoded('abuseipdb_control_coll')

Which is actually a Python KeyError for the UrlEncoded name of the "control_coll" holding user data (in essence, "collection not found"). The search query causing this is:

index=fortianalyzer action="accept"

| dedup srcip

| eval is_internal=`isInternalIPv4(srcip)`

| search is_internal=False

| abuseipdbcheck ip=srcip

| where abuseConfidenceScore=100

| table srcip, abuseConfidenceScore

Interestingly, directly running the abuseipdbcheck on the search head works:

|makeresults

|abuseipdbcheck ip=127.0.0.1

isoutamo

Based on your example and those in vendor’s documentation, your issue is that you are using only streaming commands before abuseipdbcheck and also it is streaming command. This are meaning that all processing are done in indexers (unlike makeresults and vendor’s examples which are running on SH side). In your case your search try to use kvstore lookup on indexers, but unless you have mark it for replication to indexer (default is false), it cannot find it and fail.

Now you have two options:

define that collection will replicate to indexers
force execution to SH side using non streaming command or use e.g. localop command https://docs.splunk.com/Documentation/Splunk/9.4.1/SearchReference/Localop which force execution to SH

marathon-man

I see. I'd really like option one to work here, and it did work for a user that was using Splunk on-prem:

[abuseipdb_control_coll]

enforceTypes = true

field._key = string

field.value = string

replicate = true

But this same configuration was apparently not working for a user on Splunk Cloud Victoria, whatever the reason (see the previous example). I was unsure whether it was a problem with eventual consistency, or what exactly.

martin_mueller

Is your command actively querying the collection? If so, replicate=true won't help you.

replicate=true will push the collection's content to the indexers *as CSV*. The KV Store on the indexers (if even running) won't know the collection or its content.

I'm surprised and doubtful that replicate=true ever worked for someone running Splunk Enterprise on-prem.

Easiest fix would be to only use the command on the SHs, e.g. after a |stats, |tstats, etc. - or if need be |localop.

marathon-man

Is your command actively querying the collection? If so, replicate=true won't help you.

replicate=true will push the collection's content to the indexers *as CSV*. The KV Store on the indexers (if even running) won't know the collection or its content.

Ah, that explains a lot!

I'm surprised and doubtful that replicate=true ever worked for someone running Splunk Enterprise on-prem.

They claimed it did, but I never tested it myself.

Easiest fix would be to only use the command on the SHs, e.g. after a |stats, |tstats, etc. - or if need be |localop.

I guess setting local = true for the command (actually all the commands I guess) would do the trick here, as mentioned in another reply?

martin_mueller

@marathon-man wrote:
I guess setting local = true for the command (actually all the commands I guess) would do the trick here, as mentioned in another reply?

Yes, but: If someone runs, say, index=foo | mylocalcommand then Splunk will pull all events to the SH, including all _raw and fields - that's a lot of work. Therefore I'd include the recommendations (e.g. "only run after stats") in the docs together with setting local=true.

PickleRick

I suppose the command should be defined as centralized streaming command instead of distributed one - the local setting in commands.conf - see https://docs.splunk.com/Documentation/Splunk/latest/Admin/Commandsconf

marathon-man

I suppose the command should be defined as centralized streaming command instead of distributed one - the local setting in commands.conf - see https://docs.splunk.com/Documentation/Splunk/latest/Admin/Commandsconf

Thanks, I'll be giving this a try. I guess I can turn replicate off for the collections while I'm at it.

iamsahilshaiks

Thanks,
Shaik Sahil

Splunk Core Certified Consultant

PickleRick

@iamsahilshaiksKVStore contents are not replicated between tiers. Actually, kvstore does not normally run on indexer tier at all and even if for some reason it is (you fancy runing a modular input which uses kvstore on an indexer instead of on a separate HF) it is not replicated anywhere. If a collection is replicated to the indexer tier, its contents are getting exported as CSV to the knowledge bundle (so there are possible size/performance issues with it).

isoutamo

There couple of parameters in limits.conf and distsearch.conf which define how this is done and what are limitations for those replicated CSV files.

isoutamo

Here https://splunkbase.splunk.com/app/6368 is one cool splunk app to use both SCP and onprem. Basically it’s btool with some additions, I strongly recommended it.

Splunk Cloud collection replication

Best Strategies to Optimize Observability Costs

Fueling your curiosity with new Splunk ILT and eLearning courses

Splunk AI Assistant for SPL 1.1.0 | Now Personalized to Your Environment for Greater ...