Hello,
I've the following situation:
I've inside logs the ETL logs, I've already extracted some data via search fields.
The log structure is the following:
Fri Dec 1 16:00:59 2023 [extracted_pid] [extracted_job_name] [extracted_index_operation_incremental] extracted_message
Example
Fri Dec 1 07:57:40 2023 [111111][talend_job_name] [100] End job
Fri Dec 1 06:50:40 2023 [111111][talend_job_name] [70] Start job
Fri Dec 1 06:50:39 2023 [111111][talend_job_name1] [69] End job
Fri Dec 1 05:40:40 2023 [111111][talend_job_name1] [30] Start job
Fri Dec 1 05:40:39 2023 [111111][talend_job_name2] [29] End job
Fri Dec 1 02:50:40 2023 [111111][talend_job_name2] [1] Start job
Expected:
PID NAME EXEC_TIME
111111 talend_job_name 1h 7min
111111 talend_job_name1 1h 10min
111111 talend_job_name2 2h 50min
What I was requested to do is to extract a table containing the job name and the execution time, one for each pid (a job can be executed multiple times, but each time has a different PID) in order to have the data available.
It is not necessary that the job starts with index 1, since all subjobs inside a job have a separated logged name (for example, the import all could contain 10 subjobs, each of one with different names)
My idea of a query would be a query that involves the PID and the job name combined as primary key, considering the start time the lower extracted_index_operation_incremental for that specific PK and the end time the max value of extracted_index_operation_incremental for that PK.
Any help? 😄
Thanks for any reply.
Hi @marco_carolo ,
you should extract all the fields and then correlate them:
<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats
earliest(_time) AS earliest
latest(_time) AS latest
BY talend_job_name
| eval
duration=latest-earliest,
earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),
latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table talend_job_name earliest latest duration
Ciao.
Giuseppe
Hi @marco_carolo ,
you should extract all the fields and then correlate them:
<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats
earliest(_time) AS earliest
latest(_time) AS latest
BY talend_job_name
| eval
duration=latest-earliest,
earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),
latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table talend_job_name earliest latest duration
Ciao.
Giuseppe
Thanks a lot,
a bonus question for a bonus karma point...
I want to create a stacked column chart with for each job name the distinct values fo duration.
What is the exact syntax to do to achive that?
Sorry, I didn't get it.
One of requests I had was:
I want to see the delta for multiple execution of a particular job, in order to find out if the job is getting slower or faster.
I thought to do a timechart in order to find out for each job the execution delta time, I'm missing the correct way to group the timechart by the duration of each execution...
Hi @marco_carolo ,
try to adapt something like this:
<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats
earliest(_time) AS _time
latest(_time) AS latest
BY talend_job_name
| eval
duration=latest-_time
| timechart values(duration) AS duration BY talend_job_name
Ciao.
Giuseppe
I think there's a missing point here, that I've adapted inside my current solution.
The primary key here is not only the job name but the PID also. The job could be executed multiple time during the day or time span chosen, the search key is talend_job and talend_pid in order to have unique duration...