Deployment Architecture

[SmartStore] on Google Cloud Storage

seegeekrun
Path Finder

Disclaimer-- I realize Google's Cloud Storage isn't officially support, but it is S3 compliant, so it's being tested.

I'm running into the following issue with that I'm not able to determine the root cause for with running S2 on GCP cloud buckets.

03-13-2019 09:39:07.194 -0500 ERROR S3Client - command=multipart-upload command=begin transactionId=0x2afb6da35000 rTxnId=0x2afb6da42000 status=completed success=N uri=https://storage.googleapis.com/<CLOUDBUCKET>/<INDEXNAME>/db/bb/e7/94~298A356E-F9F7-4713-B865-B9C7F7926ECB/guidSplunk-97A1AC40-F222-4E18-A4BF-86EFF89E8EA9/1552404180-1552329547-2210723404812320681.tsidx statusCode=400 statusDescription="Bad Request" payload="<?xml version='1.0' encoding='UTF-8'?><Error><Code>InvalidArgument</Code><Message>Invalid argument.</Message><Details>POST object expects Content-Type multipart/form-data</Details></Error>"

Looking at the indexes.conf spec, there is a config available that can be set for specific headers.

remote.s3.header.POST.Content-Type = "multipart/form-data”

That did not help though. In fact, it caused and immediate crash of the indexers when I applied the bundle.

So my question is-- what are the some other facets I can look at to understand the underlying problem here? It seems as those when it comes time to roll the bucket to Warm, it triggers the upload to remote storage, fails.

0 Karma
1 Solution

seegeekrun
Path Finder

Solved!

So, as noted in previous comments, the issue here comes down to the difference between how AWS and Google handle multipart uploads. The details of this can be found in the links referenced earlier.

The solution (or workaround depending on how one wants to view it) is to set the following:

# in indexes.conf
[volume:remote_store]
# ... remote config values ...
remote.s3.multipart_download.part_size = 0
remote.s3.multipart_upload.part_size = 2147483648  #2GB, or some value less than 5GB, the GCS limit

This ensures that the S3Client will not attempt a multipart upload for objects smaller than the stated size. With maxDataSize set to auto, the default is 750(ish)MB and therefore none of the large objects, like tsidx files, will be uploaded as multipart.

View solution in original post

0 Karma

seegeekrun
Path Finder

Solved!

So, as noted in previous comments, the issue here comes down to the difference between how AWS and Google handle multipart uploads. The details of this can be found in the links referenced earlier.

The solution (or workaround depending on how one wants to view it) is to set the following:

# in indexes.conf
[volume:remote_store]
# ... remote config values ...
remote.s3.multipart_download.part_size = 0
remote.s3.multipart_upload.part_size = 2147483648  #2GB, or some value less than 5GB, the GCS limit

This ensures that the S3Client will not attempt a multipart upload for objects smaller than the stated size. With maxDataSize set to auto, the default is 750(ish)MB and therefore none of the large objects, like tsidx files, will be uploaded as multipart.

0 Karma

seegeekrun
Path Finder

I'm going to take my digging as the answer here.

In 7.2.3 and 7.2.4.2, I'm finding that this may be due to a difference in how Google implements the S3 API. So currently, GCS is not a viable option for S2.

For reference--

This includes testing various configurations like:
- remote.s3.headers.POST.Content-Type //to resolve multipart-upload error
- remote.s3.use_delimiter //to see if the guidSplunk delimiter was the issue
- use_batch_remote_rep_changes //see if it was related to a race condition with the CM making calls to the peers

0 Karma

nickhills
Ultra Champion

Just looking at that error again, I’m not sure it’s a missing header it’s complaining about so much as the actual content of the post data.

I wonder if Splunk is not sending “multi/form” yet google is expecting it.

Not sure how s2 posts, but maybe you need to send a header to tell google to expect something other than multi/form??

Just a guess.

If my comment helps, please give it a thumbs up!
0 Karma

seegeekrun
Path Finder

I was thinking something similar.

Reading through the Google documentation here:
https://cloud.google.com/storage/docs/json_api/v1/how-tos/multipart-upload
It talks about "multipart/related", not "multipart/form-data"

Digging further, I'm finding that this may be due to a difference in how Google implements the S3 API.

https://www.zenko.io/blog/four-differences-google-amazon-s3-api/
and referenced here,
https://github.com/kahing/goofys/issues/259#issuecomment-355713879

This is because GCS's S3 implementation does not support S3 multipart uploads and instead reinvented another API (ugh). Coincidentally I've started working on a fix for this so stay tuned.

0 Karma

nickhills
Ultra Champion

Having looked at those links, I agree.

Sadly, I think you have your answer about GCS.
(for now)

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

What Splunk version are you running?

It may not be related to your issue, but you should be on 7.2.4.2.
Earlier versions did not support DMA on S2, and there is a hotfix in the latest .2 release specifically for S2

If my comment helps, please give it a thumbs up!
0 Karma

seegeekrun
Path Finder

No dice. On version 7.2.4.2 the above error regarding a POST expecting multipart-upload persists.

Also, the use of the remote.s3.header.x.x config triggers and immediate crash still. The crash log shows it's cachemanager that's choking on it.

0 Karma

seegeekrun
Path Finder

We're currently running 7.2.3.

I know there are some dashboards added to the MC in 7.2.4, but we're not quite there yet.

Also, the indexes that we're testing do not use DMA. That was part of our test prep.

0 Karma

nickhills
Ultra Champion

Splunk Enterprise 7.2.4.2
This release address an issue that might impact data durability under certain rare cluster conditions. The issue is triggered when there is a confluence of data replication errors from index clustering as well as an upload to the Splunk object store medium (SmartStore) via secondary or tertiary replication nodes.

While the incidence of the condition is rare and the impact is negligible, we recommend that customers that are currently using SmartStore in a clustered production environment upgrade to version 7.2.4.2 and set max_replication_errors in server.conf to 20.

If my comment helps, please give it a thumbs up!
0 Karma

seegeekrun
Path Finder

This could be it since the error is related to the multipart-upload command. It would've been nice if the release notes could've been a bit more specific with that it was resolving.

I'm going to try the upgrade path in a test environment and see if that resolves that error. I'll report part my findings.

0 Karma

nickhills
Ultra Champion

Posted for Info ^ Sadly does not appear to refer to your issue.

If my comment helps, please give it a thumbs up!
0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...