Please help. I just completed self learning fundamentals and already have a task I want to try, first post here so please be gentle :-).
I have two files containing job run details for two different jobs over 3 months. it also contains a julian date format from the mainframe but the event data is very similar.
The jobs have a relationship in that on a particular day job_a(file1) is a prerequisite to job_b(file 2).
in pseudo : I want to calculate the difference between the time job_a started and the time job_b started for each day. Assumptions confirmed : job_a is always earlier then job_b difference in hh:mm.
So this is the regex I used to convert the julian to gregorian while importing each of the files so I could use the event data as my _time and date in the index :
timestamp format regex = %y%j%H.%M.%S.%N
timestamp prefix regex = [A-Z][A-Z]\d\d\s\d\d
Here is the sample records in each file : Delimited by space : fileds = Julian_date time jobname jobnr
FILE1 : 2 sample records :
2021056 00.30.06.05 JOB_A
2021055 01.30.10.43 JOB_A
FILE2 : 2 sample records :
2021056 03.30.23.50 JOB_B
2021055 02.00.10.43 JOB_B
the output I would like to achieve is :
DATE JOB_A_START JOB_B_START START_TIME_DIFF
2021-02-24 00H30 03H30 03H00
2021-02-23 01H30 02H00 00H30
I would really appreciate the approach thinking as well (i.e. why steps are done) because I found myself questioning even how I would approach the index and source and source-types because I ended up just coding a lot of SPL trying to get to something that looks like it will work(Eventually ended up deleting the index). I was very comfortable dealing each file individually creating graphs etc... but the minute the second one came in and I needed it as a "joint" output plus comparing the date fields and subtracting it in the event data etc..etc.. I realized I was now more confused than with a single file... I think if I can get the thought process of an experienced person it will really help(this is something I miss due to trying self learning 🙂 ).
I hope the above is clearly articulated.
Again, thanks in advance.
If I understand correctly, then the process might look like this (although I'm sure there are many ways to get this done)
This assumes one event per day for each file and as you said job_b is always the later event. If it is more complex than that, then the search would get a bit more complex too.
But something like this I think:
<search both files>
| eval date = strftime(_time,"%Y-%m-%d")
| stats earliest(_time) as job_a, latest(_time) as job_b by date
| eval elapsed = tostring(job_b - job_a,"duration")
| eval job_a = strftime(job_a,"%H:%M"), job_b = strftime(job_b,"%H:%M")
| table date, job_a, job_b, elapsed
| sort date
@maciepthank you , thank you.... This worked even straight as a copy and paste :-). However, and more important, is that I started from scratch importing the files and building the INDEX and realized why I confused myself so much. My problem was that I was creating different source-types when trying the field extractions and trying to call it differently(i.e. joba_jdate & jobb_jdate...), this made me realize that I was trying to follow structured programming constructs, which meant at some point I was thinking along the lines of i.e. "if joba_jdate < jobb_jdate ...do xyz , etc...) which is what confused me even more.
Ok so it was nice trying to tackle this with my own data and problem cause it makes so much more sense then learning tutorial exercises etc ...
Thanks again and much appreciated. "ps! I'LL BE BACK 🙂 " ...
Stay safe.
If I understand correctly, then the process might look like this (although I'm sure there are many ways to get this done)
This assumes one event per day for each file and as you said job_b is always the later event. If it is more complex than that, then the search would get a bit more complex too.
But something like this I think:
<search both files>
| eval date = strftime(_time,"%Y-%m-%d")
| stats earliest(_time) as job_a, latest(_time) as job_b by date
| eval elapsed = tostring(job_b - job_a,"duration")
| eval job_a = strftime(job_a,"%H:%M"), job_b = strftime(job_b,"%H:%M")
| table date, job_a, job_b, elapsed
| sort date