All Apps and Add-ons

Cloud Storage Bucket Input Using the Splunk Add-on for Google Cloud Platform

a805555
New Member

I installed and configure successfully "Splunk Add-on for GCP" version 3.0.2 to access data xml files stored in a bucket.

I use it for 2 GCP bucket (DEV and PROD).

It's works well in DEV with a dedicated bucket with hundreds files directly in root

But it didn't work well with PROD bucket (a larger one with thousands files in a tree). It seems to be continuously reading sames files in first directory and don't index them because of unsupported type.

I don't understand why it didn't scan the entire tree and didn't throw error in the process. Why message is always "Files to be ingested: 978" since there are 1916 files in first directory called cdp ? 

I didn't find a way to filter by example by specifying a path to analyze just that path and not the complete bucket.

 

Does somebody have ideas ?

 

Thanks by advance.

 

Following is extract of log file splunk_ta_google_cloudplatform_google_cloud_bucket_metadata__1.log

2021-07-26 10:53:10,700 level=INFO pid=34200 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:ingest_data:107 |  | message="-----Data Ingestion begins-----"

2021-07-26 10:53:36,829 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:ingest_data:107 |  | message="-----Data Ingestion begins-----"

2021-07-26 10:53:45,848 level=WARNING pid=15708 tid=MainThread logger=googleapiclient.discovery_cache pos=__init__.py:autodetect:44 | file_cache is unavailable when using oauth2client >= 4.0.0

Traceback (most recent call last):

  File "D:\SPLUNK\etc\apps\Splunk_TA_google-cloudplatform\bin\3rdparty\googleapiclient\discovery_cache\__init__.py", line 41, in autodetect

    from . import file_cache

  File "D:\SPLUNK\etc\apps\Splunk_TA_google-cloudplatform\bin\3rdparty\googleapiclient\discovery_cache\file_cache.py", line 41, in <module>

    'file_cache is unavailable when using oauth2client >= 4.0.0')

ImportError: file_cache is unavailable when using oauth2client >= 4.0.0

2021-07-26 10:53:46,118 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:get_metadata:264 |  | message="Successfully obtained bucket metadata for prd-europe-west1-archiving"

2021-07-26 10:53:46,259 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:get_metadata:269 |  | message="Successfully obtained object information present in the bucket prd-europe-west1-archiving."

2021-07-26 10:53:47,107 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:get_list_of_files_to_be_ingested:352 |  | message="Files to be ingested: 978 files"

2021-07-26 10:53:47,224 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:ingest_file_content:396 |  | message="Cannot ingest contents of cdp/f006006102/processing/InternalTranscodifications_f006006102_161839.avro, file with this extention is not yet supported in the TA"

2021-07-26 10:53:47,361 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:ingest_file_content:396 |  | message="Cannot ingest contents of cdp/f006006102/processing/InternalTranscodifications_f006006102_161916.avro, file with this extention is not yet supported in the TA"

Labels (1)
0 Karma

rsaliou
Engager

Still same with latest version 3.2.0

0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...