Getting Data In

job status polling via external API

mitag
Contributor

What are the best practices in collecting job statuses in Splunk via an external API?

(I am not sure I am asking the right question, or asking the question correctly - so please bear with me.)

With a log file, Splunk only ingests what's been appended to the file since the last ingest, and not the entire file. With API polling it's a little trickier as even if the last record is unchanged, prior records (job statuses) may still refer to jobs that are in progress; their statuses needing to be ingested into Splunk... My initial impulse is write the Python polling script (as part of a "Scripted Input") as follows:

  1. Poll the API, capture states of all job statuses and write them to a file
  2. During the next poll, poll the API again, then read the "states" file, determine what's changed, and send only the updated records to Splunk
  3. Update the "states" file with new data.

Is there a simpler way?

Thanks!

P.S. Sample data that a Python script collects via an API call:

 

[{"id":"1","fileName":"257158727.mpg","scheduledAt":"Jul 31, 2020 6:51:17 AM","status":"Finished","result":"Failure","correct":"Run correction|10058","progress":"0|00000173a5242","startTime":"Jul 31, 2020 6:51:20 AM","completionTime":"Jul 31, 2020 7:07:45 AM",},
{"id":"2","fileName":"257164625.ts","scheduledAt":"Jul 31, 2020 6:11:50 AM","status":"Finished","result":"Failure","correct":"Correction in Progress||00000173a5000","progress":"86|843|00000173a5000","startTime":"Jul 31, 2020 6:11:53 AM","completionTime":"Jul 31, 2020 6:53:35 AM"},
{"id":"3","fileName":"257166304.ts","scheduledAt":"Jul 31, 2020 5:03:05 AM","status":"Finished","result":"Failure","correct":"correction completed|00000173a4c11","progress":"100|00000173a4c11","startTime":"Jul 31, 2020 5:03:07 AM","completionTime":"Jul 31, 2020 6:44:23 AM"}]

 

Note that "status" and "result" fields are rather meaningless when determining if the job has finished. Instead I must extract the first stanza in the "correct" field and make the determination based on its value: if it contains "Correction in Progress", the job is in progress; anything else - it's done.

P.P.S. The sample data is from Interra Systems' Baton Content Corrector. The data format (job or task UUID, status, timestamps, other metadata) is very common across most job and session tracking systems (transcoding farms, file transfer platforms, etc.) with the goal of detecting anomalies, issues, stuck jobs.

P.P.P.S. I am assuming the best practice is to follow the "Example script that polls a database" except modify it for my purposes; my hope is that there's yet another "best practice" on top of it as polling job statuses is conceptually different from "tailing" a database.

Labels (1)
Tags (1)
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...