Getting Data In

job status polling via external API

mitag
Contributor

What are the best practices in collecting job statuses in Splunk via an external API?

(I am not sure I am asking the right question, or asking the question correctly - so please bear with me.)

With a log file, Splunk only ingests what's been appended to the file since the last ingest, and not the entire file. With API polling it's a little trickier as even if the last record is unchanged, prior records (job statuses) may still refer to jobs that are in progress; their statuses needing to be ingested into Splunk... My initial impulse is write the Python polling script (as part of a "Scripted Input") as follows:

  1. Poll the API, capture states of all job statuses and write them to a file
  2. During the next poll, poll the API again, then read the "states" file, determine what's changed, and send only the updated records to Splunk
  3. Update the "states" file with new data.

Is there a simpler way?

Thanks!

P.S. Sample data that a Python script collects via an API call:

 

[{"id":"1","fileName":"257158727.mpg","scheduledAt":"Jul 31, 2020 6:51:17 AM","status":"Finished","result":"Failure","correct":"Run correction|10058","progress":"0|00000173a5242","startTime":"Jul 31, 2020 6:51:20 AM","completionTime":"Jul 31, 2020 7:07:45 AM",},
{"id":"2","fileName":"257164625.ts","scheduledAt":"Jul 31, 2020 6:11:50 AM","status":"Finished","result":"Failure","correct":"Correction in Progress||00000173a5000","progress":"86|843|00000173a5000","startTime":"Jul 31, 2020 6:11:53 AM","completionTime":"Jul 31, 2020 6:53:35 AM"},
{"id":"3","fileName":"257166304.ts","scheduledAt":"Jul 31, 2020 5:03:05 AM","status":"Finished","result":"Failure","correct":"correction completed|00000173a4c11","progress":"100|00000173a4c11","startTime":"Jul 31, 2020 5:03:07 AM","completionTime":"Jul 31, 2020 6:44:23 AM"}]

 

Note that "status" and "result" fields are rather meaningless when determining if the job has finished. Instead I must extract the first stanza in the "correct" field and make the determination based on its value: if it contains "Correction in Progress", the job is in progress; anything else - it's done.

P.P.S. The sample data is from Interra Systems' Baton Content Corrector. The data format (job or task UUID, status, timestamps, other metadata) is very common across most job and session tracking systems (transcoding farms, file transfer platforms, etc.) with the goal of detecting anomalies, issues, stuck jobs.

P.P.P.S. I am assuming the best practice is to follow the "Example script that polls a database" except modify it for my purposes; my hope is that there's yet another "best practice" on top of it as polling job statuses is conceptually different from "tailing" a database.

Labels (2)
Tags (1)
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...