Dashboards & Visualizations

timechart success/fail from log

ashraf_sj
Explorer

I'm trying to find the job that had failure and success, this is being ingested into Splunk as a job log. I have to plot this as a time chart based on hours for that day. eg. Today

As there is no criteria to say it ran successfully, I had to capture just the error in the log to say that JOB_NUMBER failed.

 

Sample of success log,

7/18/23 8:40:58 AM            INFO      Start Date and Time: 7/18/23 8:40:58 AM
7/18/23 8:40:58 AM            INFO      Job Number: 1000000018011
7/18/23 8:40:58 AM            INFO      Project Name: PROJECT XXX
7/18/23 8:40:58 AM            INFO      Submitted By: SSS
7/18/23 8:40:58 AM            INFO      Submitted From: Scheduler
7/18/23 8:40:58 AM            INFO      System: SERVER
7/18/23 8:40:58 AM            INFO      APP NAME
7/18/23 8:40:58 AM            INFO      Executing project 'XYZ'
7/18/23 8:40:58 AM            INFO      Project location: /some/where/in/NAS/file.xml
7/18/23 8:40:58 AM            INFO      Executing module 'Main'
7/18/23 8:40:58 AM            INFO      Executing task 'sftp 1.0 (Connect to SFTP)'
7/18/23 8:40:58 AM            INFO      Connecting to 'someserver' at port '22' as user 'XXX'
7/18/23 8:40:58 AM            INFO      Executing sub-task 'put'
7/18/23 8:40:58 AM            INFO      Setting the data type to BINARY
7/18/23 8:40:58 AM            INFO      0 files were uploaded successfully
7/18/23 8:40:58 AM            INFO      Finished sub-task 'put'
7/18/23 8:40:58 AM            INFO      Closed the FTP connection
7/18/23 8:40:58 AM            INFO      Finished task 'sftp 1.0 (Connect to SFTP)'
7/18/23 8:40:58 AM            INFO      Executing task 'move 1.0 (Move uploaded files to Archive)'
7/18/23 8:40:58 AM            INFO      0 files were moved successfully
7/18/23 8:40:58 AM            INFO      Finished task 'move 1.0 (Move uploaded files to Archive)'
7/18/23 8:40:58 AM            INFO      Finished module 'Main'
7/18/23 8:40:58 AM            INFO      Finished project 'PROJECT XXX'
7/18/23 8:40:58 AM            INFO      End Date and Time: 7/18/23 8:40:58 AM


SPL to get the total jobs and plot to time chart for each hour of the day

index=xxx sourcetype=job_sourcetype source=job_log
| rex "Job Number: (?P<JOB_NUMBER>.+)"
| dedup JOB_NUMBER
| rex "Start Date and Time: (?P<START_DATE_TIME>.+)"
| eval DATE_TIME=strftime(START_DATE_TIME,"%d/%m/%Y %H")
| timechart span=1h count AS TOTAL_JOBS BY DATE_TIME

 

Sample log of failed job,

7/18/23 8:15:58 AM            INFO      Start Date and Time: 7/18/23 8:15:58 AM
7/18/23 8:15:58 AM            INFO      Job Number: 1000000018003
7/18/23 8:15:58 AM            INFO      Project Name: XYX PROJECT
7/18/23 8:15:58 AM            INFO      Submitted By: SOMEONE
7/18/23 8:15:58 AM            INFO      Submitted From: Scheduler
7/18/23 8:15:58 AM            INFO      System: SERVER
7/18/23 8:15:58 AM            INFO      APP NAME
7/18/23 8:15:58 AM            INFO      Executing project 'SOME PROJECT'
7/18/23 8:15:58 AM            INFO      Project location: /some/where/in/nas/file.xml
7/18/23 8:15:58 AM            INFO      Executing module 'Main'
7/18/23 8:15:58 AM            INFO      Executing task 'timestamp 1.0 (Current date)'
7/18/23 8:15:58 AM            INFO      Default system date, time, and timestamp variables have been created and/or set to the current date and time '2023-07-18 08:15:58.182'
7/18/23 8:15:58 AM            INFO      Finished task 'timestamp 1.0 (Current date)'
7/18/23 8:15:58 AM            ERROR     [1234 - Copy All Files Except Offer Pack Files  'file name/directory' not found. Full stack trace written to '1000000018003_error_1.log'
7/18/23 8:15:58 AM            INFO      Continuing with the next task or module, if any
7/18/23 8:15:58 AM            ERROR     [1235 - Copy All Files Except Offer Pack Files 'file name/directory' not found. Full stack trace written to '1000000018003_error_2.log'
7/18/23 8:15:58 AM            INFO      Continuing with the next task or module, if any
7/18/23 8:15:58 AM            INFO      Finished module 'Main'
7/18/23 8:15:58 AM            INFO      Finished project 'SOME PROJECT'
7/18/23 8:15:58 AM            INFO      End Date and Time: 7/18/23 8:15:58 AM

 



SPL to get the job errors and time chart for hour of the day.

index=xxx sourcetype=job_sourcetype source=job_log error
| rex "Job Number: (?P<JOB_NUMBER>.+)"
| dedup JOB_NUMBER
| rex "Submitted From: (?P<SUBMITTED_FROM>.+)"
| rex "Start Date and Time: (?P<START_DATE_TIME>.+)"
| eval DATE_TIME=strftime(START_DATE_TIME,"%d/%m/%Y %H")
| rex max_match=0 "ERROR (?P<ERROR>.*)"
| where SUBMITTED_FROM="Scheduler"
| timechart span=1h count AS JOB_FAILURE BY DATE_TIME

 

Both the sources are from same index, source and sourcetype.

I need to plot a chart with both total jobs and failures for that hour as a stacked bar chart. The struggle here is to combine these 2 queries to get a time chart showing both total jobs and failure. I tried showing up success and failure but no luck. 

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ugh.

Firstly, I tend to avoid the "dedup" command since it's not always obvious what you'll get as an output (it retains the first seen occurrence of given field(s) along with the event it came with which might not be what you wanted, especially if you want those other fields from the event).

Secondly, as @ITWhisperer already noticed - assuming that these lines constitute separate events, there is no field to identify the subsequent events and tie them to a particular job so if your two job events interleaved you'd have no way of knowing which line is from which job.

Thirdly, the way to go, if you had "more decent" data, would be to simply evaluate a request status to be either "success" or "failure" and do a timechart over this status field.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The problem with dummy data is when it doesn't accurately represent your real data.

For example, are all the timestamps really all the same for the same job and that job takes less than a second?

Is there a possibility that events from jobs are interleaved i.e. do jobs run concurrently?

0 Karma
Get Updates on the Splunk Community!

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...

Adoption of Infrastructure Monitoring at Splunk

  Splunk's Growth Engineering team showcases one of their first Splunk product adoption-Splunk Infrastructure ...