Solved: Extract execution time

marco_carolo · ‎12-01-2023

Hello,

I've the following situation:

I've inside logs the ETL logs, I've already extracted some data via search fields.

The log structure is the following:

Fri Dec 1 16:00:59 2023 [extracted_pid] [extracted_job_name] [extracted_index_operation_incremental] extracted_message

Example

Fri Dec 1 07:57:40 2023 [111111][talend_job_name] [100] End job

Fri Dec 1 06:50:40 2023 [111111][talend_job_name] [70] Start job

Fri Dec 1 06:50:39 2023 [111111][talend_job_name1] [69] End job

Fri Dec 1 05:40:40 2023 [111111][talend_job_name1] [30] Start job

Fri Dec 1 05:40:39 2023 [111111][talend_job_name2] [29] End job

Fri Dec 1 02:50:40 2023 [111111][talend_job_name2] [1] Start job

Expected:

PID NAME EXEC_TIME

111111 talend_job_name 1h 7min

111111 talend_job_name1 1h 10min

111111 talend_job_name2 2h 50min

What I was requested to do is to extract a table containing the job name and the execution time, one for each pid (a job can be executed multiple times, but each time has a different PID) in order to have the data available.

It is not necessary that the job starts with index 1, since all subjobs inside a job have a separated logged name (for example, the import all could contain 10 subjobs, each of one with different names)

My idea of a query would be a query that involves the PID and the job name combined as primary key, considering the start time the lower extracted_index_operation_incremental for that specific PK and the end time the max value of extracted_index_operation_incremental for that PK.

Any help? 😄

Thanks for any reply.

gcusello · ‎12-01-2023

Hi @marco_carolo ,

you should extract all the fields and then correlate them:

<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats 
   earliest(_time) AS earliest
   latest(_time) AS latest
   BY talend_job_name
| eval 
   duration=latest-earliest, 
   earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),   
   latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table talend_job_name earliest latest duration

Ciao.

Giuseppe

View solution in original post

gcusello · ‎12-01-2023

Hi @marco_carolo ,

you should extract all the fields and then correlate them:

<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats 
   earliest(_time) AS earliest
   latest(_time) AS latest
   BY talend_job_name
| eval 
   duration=latest-earliest, 
   earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),   
   latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table talend_job_name earliest latest duration

Ciao.

Giuseppe

marco_carolo · ‎12-04-2023

Thanks a lot,

a bonus question for a bonus karma point...

I want to create a stacked column chart with for each job name the distinct values fo duration.

What is the exact syntax to do to achive that?

gcusello · ‎12-04-2023

Hi @marco_carolo ,

the easiest way is to follow the GUI in stached char creation.

Ciao.

Giuseppe

marco_carolo · ‎12-04-2023

Sorry, I didn't get it.

One of requests I had was:

I want to see the delta for multiple execution of a particular job, in order to find out if the job is getting slower or faster.

I thought to do a timechart in order to find out for each job the execution delta time, I'm missing the correct way to group the timechart by the duration of each execution...

@gcusello

gcusello · ‎12-04-2023

Hi @marco_carolo ,

try to adapt something like this:

<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats 
   earliest(_time) AS _time
   latest(_time) AS latest
   BY talend_job_name
| eval 
   duration=latest-_time
| timechart values(duration) AS duration BY talend_job_name

Ciao.

Giuseppe

marco_carolo · ‎12-04-2023

I think there's a missing point here, that I've adapted inside my current solution.

The primary key here is not only the job name but the PID also. The job could be executed multiple time during the day or time span chosen, the search key is talend_job and talend_pid in order to have unique duration...

Extract execution time

field extraction

table

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio