Getting Data In

Python script deployment to pull logs

Karthikeya
Communicator

We have to pull logs from Tencent COS (Cloud Object Storage) to our Splunk instances which are hosted on AWS. Tencent team has provided the script for us but we are confused where to run this and where to write the configurations. Pls help. Below is our architecture:

1 Deployment server which maintains all the configurations and pushes to respective components through serverclass.conf. We have 2 cluster managers, 6 indexers, and one heavy forwarder which directly configured to indexers through indexer discovery. 

When I am checking in chatgpt, it is giving multiple answers which I dont understand. it is asking to keep inputs.conf in default which I thought it is wrong it should be in local right?

and how to avoid duplicate logs here?

0 Karma

Karthikeya
Communicator

I didn't get this. Sorry

0 Karma

Karthikeya
Communicator

Thanks @richgalloway. I don't have much knowledge on any programming languages. Is it a good practice to create a modular input or just create a script with checkpoint included in it? I can use GPT and do this a bit. But modular input seems challenging to me. Can I raise a case for this and Splunk support team can help me in these works?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Creating a scripted or modular input is difficult without some programming knowledge, but AI should be able to help.  I understand Claude is good at programming.   Splunk Support is for break/fix issues, not scripting.  Splunk ODS can offer advice, but cannot create a script for you.  Splunk Professional Services, however, can create it.

Whether you choose a scripted or modular input, adding checkpointing is a good idea to protect against interruptions in the input.

---
If this reply helps you, Karma would be appreciated.
0 Karma

Karthikeya
Communicator

How to configure the modular input here?

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Karthikeya 

Run the Python script on your Heavy Forwarder. The HF is designed for data collection tasks, including running scripts to pull data from external APIs or cloud storage, before forwarding that data to the indexers.

You should manage this configuration via your Deployment Server by creating a custom app and pushing it to the HF.

Create a custom app directory on your Deployment Server ($SPLUNK_HOME/etc/deployment-apps/tencent_cos/default/inputs.conf) and define your input:

 

# inputs.conf #
[script://./bin/tencent_cos_pull.py]
disabled = false
index = tencent_index
sourcetype = tencent:cos
interval = 300

Splunk does not automatically deduplicate data pulled via custom scripts. The Python script provided by Tencent must contain checkpointing logic.

Without seeing the code supplied, it's difficult to say. However, if you do need checkpointing, then you can use the Splunk modular input SDK, which can ensure that you only load data from the last known position that you receive data from in order to prevent duplicate events from being read.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

richgalloway
SplunkTrust
SplunkTrust

That's not a lot to work with, but you probably should install the script as a scripted input on the HF.

Never change anything in a default directory - use local.

---
If this reply helps you, Karma would be appreciated.
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Casting Call: Compete in Cyber Games

Lights, Camera, SecOps: Apply to Compete in Cyber Games     Think you have what it takes to beat the clock? ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

How Edge Processor's Durable Queue Works

Edge Processor sits in one of the most consequential places in any Splunk pipeline: between your data sources ...