Getting Data In

job status polling via external API

mitag
Contributor

What are the best practices in collecting job statuses in Splunk via an external API?

(I am not sure I am asking the right question, or asking the question correctly - so please bear with me.)

With a log file, Splunk only ingests what's been appended to the file since the last ingest, and not the entire file. With API polling it's a little trickier as even if the last record is unchanged, prior records (job statuses) may still refer to jobs that are in progress; their statuses needing to be ingested into Splunk... My initial impulse is write the Python polling script (as part of a "Scripted Input") as follows:

  1. Poll the API, capture states of all job statuses and write them to a file
  2. During the next poll, poll the API again, then read the "states" file, determine what's changed, and send only the updated records to Splunk
  3. Update the "states" file with new data.

Is there a simpler way?

Thanks!

P.S. Sample data that a Python script collects via an API call:

 

[{"id":"1","fileName":"257158727.mpg","scheduledAt":"Jul 31, 2020 6:51:17 AM","status":"Finished","result":"Failure","correct":"Run correction|10058","progress":"0|00000173a5242","startTime":"Jul 31, 2020 6:51:20 AM","completionTime":"Jul 31, 2020 7:07:45 AM",},
{"id":"2","fileName":"257164625.ts","scheduledAt":"Jul 31, 2020 6:11:50 AM","status":"Finished","result":"Failure","correct":"Correction in Progress||00000173a5000","progress":"86|843|00000173a5000","startTime":"Jul 31, 2020 6:11:53 AM","completionTime":"Jul 31, 2020 6:53:35 AM"},
{"id":"3","fileName":"257166304.ts","scheduledAt":"Jul 31, 2020 5:03:05 AM","status":"Finished","result":"Failure","correct":"correction completed|00000173a4c11","progress":"100|00000173a4c11","startTime":"Jul 31, 2020 5:03:07 AM","completionTime":"Jul 31, 2020 6:44:23 AM"}]

 

Note that "status" and "result" fields are rather meaningless when determining if the job has finished. Instead I must extract the first stanza in the "correct" field and make the determination based on its value: if it contains "Correction in Progress", the job is in progress; anything else - it's done.

P.P.S. The sample data is from Interra Systems' Baton Content Corrector. The data format (job or task UUID, status, timestamps, other metadata) is very common across most job and session tracking systems (transcoding farms, file transfer platforms, etc.) with the goal of detecting anomalies, issues, stuck jobs.

P.P.P.S. I am assuming the best practice is to follow the "Example script that polls a database" except modify it for my purposes; my hope is that there's yet another "best practice" on top of it as polling job statuses is conceptually different from "tailing" a database.

Labels (1)
Tags (1)
0 Karma
Get Updates on the Splunk Community!

Unlock Database Monitoring with Splunk Observability Cloud

  In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and ...

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

At Cisco, purpose isn’t a tagline—it’s a commitment. Cisco’s FY25 Purpose Report outlines how the company is ...

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk

Join us for a live Demo Day at the Cisco Store on January 21st 10:00am - 11:00am PST In the fast-paced world ...