Splunk Search

newbie : how to compare two events from different source in one index by data in event and subtract time diff

KING_JULIAN
Engager

Please help. I just completed self learning fundamentals and already have a task I want to try, first post here so please be gentle :-).

I have two files containing job run details for two different jobs over 3 months. it also contains a julian date format from the mainframe but the event data is very similar.

The jobs have a relationship in that on a particular day job_a(file1) is a prerequisite to job_b(file 2).

in pseudo : I  want to calculate the difference between the time job_a started and the time job_b started for each day. Assumptions confirmed  : job_a is always earlier then job_b difference in hh:mm. 

So this is the regex I used to convert the julian to gregorian while importing each of the files so I could use the event data as my _time and date in the index :

timestamp format regex = %y%j%H.%M.%S.%N
timestamp prefix regex = [A-Z][A-Z]\d\d\s\d\d

Here is the sample records in each file : Delimited by space : fileds = Julian_date time jobname jobnr

FILE1 : 2 sample records : 

2021056 00.30.06.05 JOB_A
2021055 01.30.10.43 JOB_A

FILE2 : 2 sample records : 

2021056 03.30.23.50 JOB_B
2021055 02.00.10.43 JOB_B

the output I would like to achieve is  :

             DATE                  JOB_A_START       JOB_B_START        START_TIME_DIFF

2021-02-24                                 00H30                       03H30                                03H00

2021-02-23                                 01H30                       02H00                               00H30

I would really appreciate the approach thinking as well (i.e. why steps are done) because I found myself questioning even how I would approach the index and source and source-types because I ended up just coding a lot of SPL trying to get to something that looks like it will work(Eventually ended up deleting the index). I was very comfortable dealing each file individually creating graphs etc... but the minute the second one came in and I needed it as a "joint" output plus comparing the date fields and subtracting it in the event data etc..etc.. I realized I was now more confused than with a single file... I think if I can get the thought process of an experienced person it will really help(this is something I miss due to trying self learning 🙂 ). 

I hope the above is clearly articulated.

Again, thanks in advance.

Labels (1)
0 Karma
1 Solution

maciep
Champion

If I understand correctly, then the process might look like this (although I'm sure there are many ways to get this done)

  1.  Get all events from both files
  2.  Create a date field for every event based on timestamp
  3.  Get the earliest & latest events for each date
  4. Calculate the elapsed time between those 
  5.  Format / Table / Sort fields

This assumes one event per day for each file and as you said job_b is always the later event.  If it is more complex than that, then the search would get a bit more complex too.

But something like this I think:

<search both files>
| eval date = strftime(_time,"%Y-%m-%d")
| stats earliest(_time) as job_a, latest(_time) as job_b by date
| eval elapsed = tostring(job_b - job_a,"duration")
| eval job_a = strftime(job_a,"%H:%M"), job_b = strftime(job_b,"%H:%M")
| table date, job_a, job_b, elapsed
| sort date

 

View solution in original post

KING_JULIAN
Engager

@maciepthank you , thank you.... This worked even straight as a copy and paste :-). However, and more important, is that I started from scratch importing the files and building the INDEX and realized why I confused myself so much. My problem was that I was creating different source-types when trying the field extractions and trying to call it differently(i.e. joba_jdate & jobb_jdate...), this made me realize that I was trying to follow structured programming constructs, which meant at some point I was thinking along the lines of i.e.  "if joba_jdate < jobb_jdate ...do xyz , etc...) which is what confused me even more. 

Ok so it was nice trying to tackle this with my own data and problem cause it makes so much more sense then learning tutorial exercises etc ...

Thanks again and much appreciated. "ps! I'LL BE BACK 🙂 " ...

Stay safe.

maciep
Champion

If I understand correctly, then the process might look like this (although I'm sure there are many ways to get this done)

  1.  Get all events from both files
  2.  Create a date field for every event based on timestamp
  3.  Get the earliest & latest events for each date
  4. Calculate the elapsed time between those 
  5.  Format / Table / Sort fields

This assumes one event per day for each file and as you said job_b is always the later event.  If it is more complex than that, then the search would get a bit more complex too.

But something like this I think:

<search both files>
| eval date = strftime(_time,"%Y-%m-%d")
| stats earliest(_time) as job_a, latest(_time) as job_b by date
| eval elapsed = tostring(job_b - job_a,"duration")
| eval job_a = strftime(job_a,"%H:%M"), job_b = strftime(job_b,"%H:%M")
| table date, job_a, job_b, elapsed
| sort date

 

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...