I'm trying to find the job that had failure and success, this is being ingested into Splunk as a job log. I have to plot this as a time chart based on hours for that day. eg. Today
As there is no criteria to say it ran successfully, I had to capture just the error in the log to say that JOB_NUMBER failed.
Sample of success log,
7/18/23 8:40:58 AM INFO Start Date and Time: 7/18/23 8:40:58 AM
7/18/23 8:40:58 AM INFO Job Number: 1000000018011
7/18/23 8:40:58 AM INFO Project Name: PROJECT XXX
7/18/23 8:40:58 AM INFO Submitted By: SSS
7/18/23 8:40:58 AM INFO Submitted From: Scheduler
7/18/23 8:40:58 AM INFO System: SERVER
7/18/23 8:40:58 AM INFO APP NAME
7/18/23 8:40:58 AM INFO Executing project 'XYZ'
7/18/23 8:40:58 AM INFO Project location: /some/where/in/NAS/file.xml
7/18/23 8:40:58 AM INFO Executing module 'Main'
7/18/23 8:40:58 AM INFO Executing task 'sftp 1.0 (Connect to SFTP)'
7/18/23 8:40:58 AM INFO Connecting to 'someserver' at port '22' as user 'XXX'
7/18/23 8:40:58 AM INFO Executing sub-task 'put'
7/18/23 8:40:58 AM INFO Setting the data type to BINARY
7/18/23 8:40:58 AM INFO 0 files were uploaded successfully
7/18/23 8:40:58 AM INFO Finished sub-task 'put'
7/18/23 8:40:58 AM INFO Closed the FTP connection
7/18/23 8:40:58 AM INFO Finished task 'sftp 1.0 (Connect to SFTP)'
7/18/23 8:40:58 AM INFO Executing task 'move 1.0 (Move uploaded files to Archive)'
7/18/23 8:40:58 AM INFO 0 files were moved successfully
7/18/23 8:40:58 AM INFO Finished task 'move 1.0 (Move uploaded files to Archive)'
7/18/23 8:40:58 AM INFO Finished module 'Main'
7/18/23 8:40:58 AM INFO Finished project 'PROJECT XXX'
7/18/23 8:40:58 AM INFO End Date and Time: 7/18/23 8:40:58 AM
SPL to get the total jobs and plot to time chart for each hour of the day
index=xxx sourcetype=job_sourcetype source=job_log
| rex "Job Number: (?P<JOB_NUMBER>.+)"
| dedup JOB_NUMBER
| rex "Start Date and Time: (?P<START_DATE_TIME>.+)"
| eval DATE_TIME=strftime(START_DATE_TIME,"%d/%m/%Y %H")
| timechart span=1h count AS TOTAL_JOBS BY DATE_TIME
Sample log of failed job,
7/18/23 8:15:58 AM INFO Start Date and Time: 7/18/23 8:15:58 AM
7/18/23 8:15:58 AM INFO Job Number: 1000000018003
7/18/23 8:15:58 AM INFO Project Name: XYX PROJECT
7/18/23 8:15:58 AM INFO Submitted By: SOMEONE
7/18/23 8:15:58 AM INFO Submitted From: Scheduler
7/18/23 8:15:58 AM INFO System: SERVER
7/18/23 8:15:58 AM INFO APP NAME
7/18/23 8:15:58 AM INFO Executing project 'SOME PROJECT'
7/18/23 8:15:58 AM INFO Project location: /some/where/in/nas/file.xml
7/18/23 8:15:58 AM INFO Executing module 'Main'
7/18/23 8:15:58 AM INFO Executing task 'timestamp 1.0 (Current date)'
7/18/23 8:15:58 AM INFO Default system date, time, and timestamp variables have been created and/or set to the current date and time '2023-07-18 08:15:58.182'
7/18/23 8:15:58 AM INFO Finished task 'timestamp 1.0 (Current date)'
7/18/23 8:15:58 AM ERROR [1234 - Copy All Files Except Offer Pack Files 'file name/directory' not found. Full stack trace written to '1000000018003_error_1.log'
7/18/23 8:15:58 AM INFO Continuing with the next task or module, if any
7/18/23 8:15:58 AM ERROR [1235 - Copy All Files Except Offer Pack Files 'file name/directory' not found. Full stack trace written to '1000000018003_error_2.log'
7/18/23 8:15:58 AM INFO Continuing with the next task or module, if any
7/18/23 8:15:58 AM INFO Finished module 'Main'
7/18/23 8:15:58 AM INFO Finished project 'SOME PROJECT'
7/18/23 8:15:58 AM INFO End Date and Time: 7/18/23 8:15:58 AM
SPL to get the job errors and time chart for hour of the day.
index=xxx sourcetype=job_sourcetype source=job_log error
| rex "Job Number: (?P<JOB_NUMBER>.+)"
| dedup JOB_NUMBER
| rex "Submitted From: (?P<SUBMITTED_FROM>.+)"
| rex "Start Date and Time: (?P<START_DATE_TIME>.+)"
| eval DATE_TIME=strftime(START_DATE_TIME,"%d/%m/%Y %H")
| rex max_match=0 "ERROR (?P<ERROR>.*)"
| where SUBMITTED_FROM="Scheduler"
| timechart span=1h count AS JOB_FAILURE BY DATE_TIME
Both the sources are from same index, source and sourcetype.
I need to plot a chart with both total jobs and failures for that hour as a stacked bar chart. The struggle here is to combine these 2 queries to get a time chart showing both total jobs and failure. I tried showing up success and failure but no luck.
Ugh.
Firstly, I tend to avoid the "dedup" command since it's not always obvious what you'll get as an output (it retains the first seen occurrence of given field(s) along with the event it came with which might not be what you wanted, especially if you want those other fields from the event).
Secondly, as @ITWhisperer already noticed - assuming that these lines constitute separate events, there is no field to identify the subsequent events and tie them to a particular job so if your two job events interleaved you'd have no way of knowing which line is from which job.
Thirdly, the way to go, if you had "more decent" data, would be to simply evaluate a request status to be either "success" or "failure" and do a timechart over this status field.
The problem with dummy data is when it doesn't accurately represent your real data.
For example, are all the timestamps really all the same for the same job and that job takes less than a second?
Is there a possibility that events from jobs are interleaved i.e. do jobs run concurrently?