Splunk Search

Extract execution time

marco_carolo
Path Finder

Hello,

 

I've the following situation:

I've inside logs the ETL logs, I've already extracted some data via search fields.

The log structure is the following:

Fri Dec 1 16:00:59 2023 [extracted_pid] [extracted_job_name] [extracted_index_operation_incremental] extracted_message

Example

Fri Dec 1 07:57:40 2023 [111111][talend_job_name] [100] End job

Fri Dec 1 06:50:40 2023 [111111][talend_job_name] [70] Start job

Fri Dec 1 06:50:39 2023 [111111][talend_job_name1] [69] End job

Fri Dec 1 05:40:40 2023 [111111][talend_job_name1] [30] Start job

Fri Dec 1 05:40:39 2023 [111111][talend_job_name2] [29] End job

Fri Dec 1 02:50:40 2023 [111111][talend_job_name2] [1] Start job

 

Expected:

PID          NAME                         EXEC_TIME

111111 talend_job_name 1h 7min

111111 talend_job_name1 1h 10min

111111 talend_job_name2 2h 50min

 

What I was requested to do is to extract a table containing the job name and the execution time, one for each pid (a job can be executed multiple times, but each time has a different PID) in order to have the data available.

It is not necessary that the job starts with index 1, since all subjobs inside a job have a separated logged name (for example, the import all could contain 10 subjobs, each of one with different names)

My idea of a query would be a query that involves the PID and the job name combined as primary key, considering the start time the lower extracted_index_operation_incremental for that specific PK and the end time the max value of extracted_index_operation_incremental for that PK.

Any help? 😄

 

Thanks for any reply.

 

 

Labels (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @marco_carolo ,

you should extract all the fields and then correlate them:

<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats 
   earliest(_time) AS earliest
   latest(_time) AS latest
   BY talend_job_name
| eval 
   duration=latest-earliest, 
   earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),   
   latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table talend_job_name earliest latest duration

Ciao.

Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @marco_carolo ,

you should extract all the fields and then correlate them:

<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats 
   earliest(_time) AS earliest
   latest(_time) AS latest
   BY talend_job_name
| eval 
   duration=latest-earliest, 
   earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),   
   latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table talend_job_name earliest latest duration

Ciao.

Giuseppe

marco_carolo
Path Finder

Thanks a lot,

 

a bonus question for  a bonus karma point...

I want to create a stacked column chart with for each job name the distinct values fo duration.

What is the exact syntax to do to achive that?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @marco_carolo ,

the easiest way is to follow the GUI in stached char creation.

Ciao.

Giuseppe

0 Karma

marco_carolo
Path Finder

Sorry, I didn't get it.

One of requests I had was:

I want to see the delta for multiple execution of a particular job, in order to find out if the job is getting slower or faster.

I thought to do a timechart in order to find out for each job the execution delta time, I'm missing the correct way to group the timechart by the duration of each execution...

@gcusello 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @marco_carolo ,

try to adapt something like this:

<your_search>
| rex "[^\[]*\[(?<extracted_pid>[^\]]*)\]\s*\[(?<extracted_job_name>[^\]]*)\]\s*\[(?<extracted_index>[^\]]+\]\s*)(?<msg>.*)"
| stats 
   earliest(_time) AS _time
   latest(_time) AS latest
   BY talend_job_name
| eval 
   duration=latest-_time
| timechart values(duration) AS duration BY talend_job_name 

Ciao.

Giuseppe

0 Karma

marco_carolo
Path Finder

I think there's a missing point here, that I've adapted inside my current solution.

The primary key here is not only the job name but the PID also. The job could be executed multiple time during the day or time span chosen, the search key is  talend_job and talend_pid in order to have unique duration...

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...