We have to pull logs from Tencent COS (Cloud Object Storage) to our Splunk instances which are hosted on AWS. Tencent team has provided the script for us but we are confused where to run this and where to write the configurations. Pls help. Below is our architecture:
1 Deployment server which maintains all the configurations and pushes to respective components through serverclass.conf. We have 2 cluster managers, 6 indexers, and one heavy forwarder which directly configured to indexers through indexer discovery.
When I am checking in chatgpt, it is giving multiple answers which I dont understand. it is asking to keep inputs.conf in default which I thought it is wrong it should be in local right?
and how to avoid duplicate logs here?
I didn't get this. Sorry
Thanks @richgalloway. I don't have much knowledge on any programming languages. Is it a good practice to create a modular input or just create a script with checkpoint included in it? I can use GPT and do this a bit. But modular input seems challenging to me. Can I raise a case for this and Splunk support team can help me in these works?
Creating a scripted or modular input is difficult without some programming knowledge, but AI should be able to help. I understand Claude is good at programming. Splunk Support is for break/fix issues, not scripting. Splunk ODS can offer advice, but cannot create a script for you. Splunk Professional Services, however, can create it.
Whether you choose a scripted or modular input, adding checkpointing is a good idea to protect against interruptions in the input.
How to configure the modular input here?
See https://help.splunk.com/en/splunk-enterprise/get-started/get-data-in/10.2/get-other-kinds-of-data-in... or https://help.splunk.com/en/splunk-enterprise/developing-views-and-apps-for-splunk-web/9.4/modular-in...
Hi @Karthikeya
Run the Python script on your Heavy Forwarder. The HF is designed for data collection tasks, including running scripts to pull data from external APIs or cloud storage, before forwarding that data to the indexers.
You should manage this configuration via your Deployment Server by creating a custom app and pushing it to the HF.
Create a custom app directory on your Deployment Server ($SPLUNK_HOME/etc/deployment-apps/tencent_cos/default/inputs.conf) and define your input:
# inputs.conf #
[script://./bin/tencent_cos_pull.py]
disabled = false
index = tencent_index
sourcetype = tencent:cos
interval = 300Splunk does not automatically deduplicate data pulled via custom scripts. The Python script provided by Tencent must contain checkpointing logic.
Without seeing the code supplied, it's difficult to say. However, if you do need checkpointing, then you can use the Splunk modular input SDK, which can ensure that you only load data from the last known position that you receive data from in order to prevent duplicate events from being read.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
That's not a lot to work with, but you probably should install the script as a scripted input on the HF.
Never change anything in a default directory - use local.