Re: Cloud Storage Bucket Input Using the Splunk Ad...

a805555 · ‎07-26-2021

I installed and configure successfully "Splunk Add-on for GCP" version 3.0.2 to access data xml files stored in a bucket.

I use it for 2 GCP bucket (DEV and PROD).

It's works well in DEV with a dedicated bucket with hundreds files directly in root

But it didn't work well with PROD bucket (a larger one with thousands files in a tree). It seems to be continuously reading sames files in first directory and don't index them because of unsupported type.

I don't understand why it didn't scan the entire tree and didn't throw error in the process. Why message is always "Files to be ingested: 978" since there are 1916 files in first directory called cdp ?

I didn't find a way to filter by example by specifying a path to analyze just that path and not the complete bucket.

Does somebody have ideas ?

Thanks by advance.

Following is extract of log file splunk_ta_google_cloudplatform_google_cloud_bucket_metadata__1.log

2021-07-26 10:53:10,700 level=INFO pid=34200 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:ingest_data:107 | | message="-----Data Ingestion begins-----"

2021-07-26 10:53:36,829 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:ingest_data:107 | | message="-----Data Ingestion begins-----"

2021-07-26 10:53:45,848 level=WARNING pid=15708 tid=MainThread logger=googleapiclient.discovery_cache pos=__init__.py:autodetect:44 | file_cache is unavailable when using oauth2client >= 4.0.0

Traceback (most recent call last):

File "D:\SPLUNK\etc\apps\Splunk_TA_google-cloudplatform\bin\3rdparty\googleapiclient\discovery_cache\__init__.py", line 41, in autodetect

from . import file_cache

File "D:\SPLUNK\etc\apps\Splunk_TA_google-cloudplatform\bin\3rdparty\googleapiclient\discovery_cache\file_cache.py", line 41, in <module>

'file_cache is unavailable when using oauth2client >= 4.0.0')

ImportError: file_cache is unavailable when using oauth2client >= 4.0.0

2021-07-26 10:53:46,118 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:get_metadata:264 | | message="Successfully obtained bucket metadata for prd-europe-west1-archiving"

2021-07-26 10:53:46,259 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:get_metadata:269 | | message="Successfully obtained object information present in the bucket prd-europe-west1-archiving."

2021-07-26 10:53:47,107 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:get_list_of_files_to_be_ingested:352 | | message="Files to be ingested: 978 files"

2021-07-26 10:53:47,224 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:ingest_file_content:396 | | message="Cannot ingest contents of cdp/f006006102/processing/InternalTranscodifications_f006006102_161839.avro, file with this extention is not yet supported in the TA"

2021-07-26 10:53:47,361 level=INFO pid=15708 tid=MainThread logger=splunk_ta_gcp.modinputs.bucket_metadata pos=bucket_metadata.py:ingest_file_content:396 | | message="Cannot ingest contents of cdp/f006006102/processing/InternalTranscodifications_f006006102_161916.avro, file with this extention is not yet supported in the TA"

rsaliou · ‎04-11-2022

Still same with latest version 3.2.0

Cloud Storage Bucket Input Using the Splunk Add-on for Google Cloud Platform

troubleshooting

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Pro Tips for .conf26: How to Prep Like a Splunk Veteran

Turn Cisco Telemetry Into Action with Cisco Data Fabric, powered by the Splunk ...

Automated Threat Analysis: Available in ES Premier

Join the Conversation