Splunk Search

Search Query for Nested Jobs

snehasal
Explorer

Hi Everyone,

I am a newbie to Splunk and trying to create Dashboards for Data Visualization. I have Real Time Data Logs stored in a log file which I need to push to Dashboard. In my application, we have Multiple Jenkins Jobs which run daily or multiple times in a day. Each job is a workflow and inside one workflow we have multiple sessions which run.
e.g. WorkFlow1 : Has 3 Sessions named as Session1 to Session3 inside it.
Events occur as
WorkFlow1 Start -> Session1 Start -> Session1 End-> Session2 Start -> Session2 End-> Session3 Start -> Session3 End -> WorkFlow1 End
WorkFlow2 Start ->.....-> WorkFlow2 End
.....so on

I want to create a Dashboard which displays
1. average duration of each Session inside a workflow
2. average duration of entire Workflow

The logs are stored in below format:
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=WF_Start (this is start of WorkFlow)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=S_Start (this is start of Session1 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=S_End (this is end of Session1 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=WF_End (this is end WF1)
and so on

Request you to please guide me how to proceed with search query

Thanks,
Sneha Salvi

0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

Okay, so is this the way it looks with multiple sessions in the workflow?

[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=WF_Start (this is start of WorkFlow)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=S_Start (this is start of Session1 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=S_End (this is end of Session1 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S2 Step=S_Start (this is start of Session2 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S2 Step=S_End (this is end of Session2 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S2 Step=WF_End (this is end WF1)

Specifically, is the SessionName correct for the WF_End Step?

Overall, how long (duration) are workflows, and how quickly are the workflow names re-used?

When you say average time of Sessions within a workflow, are you asking, for example, to calculate how long WF1.S1 takes on average whenever WF1 runs?


Here's a first cut at the code to pull the data and calculate the averages...

(your search here)

| rename COMMENT as "Here we pull the data - delete these and change the field names as needed if the data is autoextracted."
| rex  "WorkFlowName\s*=\s*(?<WorkFlowName>\S+)" 
| rex  "SessionName\s*=\s*(?<SessionName>\S+)" 
| rex  "Step\s*=\s*(?<StepName>\S+)" 

| rename COMMENT as "Here we put the start/stop times of workflow and session into specific fields."
| eval WfStart=If(StepName="WF_Start",_time,null())
| eval WfEnd=If(StepName="WF_End",_time,null())
| eval SessStart=If(StepName="S_Start",_time,null())
| eval SessEnd=If(StepName="S_End",_time,null())

| rename COMMENT as "Here we set the session to equal the workflow if it's a workflow start or stop."
| eval SessionName=if(StepName="WF_Start" OR StepName="WF_End",WorkFlowName,SessionName)

| rename COMMENT as "Sort the record into order, copy the latest start times onto the stop records for each session and for the workflow itself."
| sort 0 _time 
| streamstats latest(WfStart) as WfStart latest(SessStart) as SessStart by WorkFlowName,SessionName
| eval WfDuration=WfEnd-WfStart
| eval SessDuration=SessEnd-SessStart

| rename COMMENT as "Now calculate the averages.  Since there's no end time on the 'start' records, they all disappear and only the end records have a duration to include in the averages"
| stats avg(WfDuration) as avgWfDuration, avg(SessDuration) as avgSessDuration by WorkFlowName,SessionName

View solution in original post

DalJeanis
SplunkTrust
SplunkTrust

Okay, so is this the way it looks with multiple sessions in the workflow?

[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=WF_Start (this is start of WorkFlow)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=S_Start (this is start of Session1 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S1 Step=S_End (this is end of Session1 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S2 Step=S_Start (this is start of Session2 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S2 Step=S_End (this is end of Session2 inside WF1)
[TimeStamp] WorkFlowName= WF1 SessionName = WF1.S2 Step=WF_End (this is end WF1)

Specifically, is the SessionName correct for the WF_End Step?

Overall, how long (duration) are workflows, and how quickly are the workflow names re-used?

When you say average time of Sessions within a workflow, are you asking, for example, to calculate how long WF1.S1 takes on average whenever WF1 runs?


Here's a first cut at the code to pull the data and calculate the averages...

(your search here)

| rename COMMENT as "Here we pull the data - delete these and change the field names as needed if the data is autoextracted."
| rex  "WorkFlowName\s*=\s*(?<WorkFlowName>\S+)" 
| rex  "SessionName\s*=\s*(?<SessionName>\S+)" 
| rex  "Step\s*=\s*(?<StepName>\S+)" 

| rename COMMENT as "Here we put the start/stop times of workflow and session into specific fields."
| eval WfStart=If(StepName="WF_Start",_time,null())
| eval WfEnd=If(StepName="WF_End",_time,null())
| eval SessStart=If(StepName="S_Start",_time,null())
| eval SessEnd=If(StepName="S_End",_time,null())

| rename COMMENT as "Here we set the session to equal the workflow if it's a workflow start or stop."
| eval SessionName=if(StepName="WF_Start" OR StepName="WF_End",WorkFlowName,SessionName)

| rename COMMENT as "Sort the record into order, copy the latest start times onto the stop records for each session and for the workflow itself."
| sort 0 _time 
| streamstats latest(WfStart) as WfStart latest(SessStart) as SessStart by WorkFlowName,SessionName
| eval WfDuration=WfEnd-WfStart
| eval SessDuration=SessEnd-SessStart

| rename COMMENT as "Now calculate the averages.  Since there's no end time on the 'start' records, they all disappear and only the end records have a duration to include in the averages"
| stats avg(WfDuration) as avgWfDuration, avg(SessDuration) as avgSessDuration by WorkFlowName,SessionName

snehasal
Explorer

Thank you so much for the answer. It works.
*Specifically, is the SessionName correct for the WF_End Step? *
I have Replace the SessionName for WF_Start and WF_End steps as 'WorkFlow', 'WorkFlow'

Overall, how long (duration) are workflows, and how quickly are the workflow names re-used?
The Workflows are of varied durations. We have about 200 distinct Workflows and 1400 distinct SessionNames.

When you say average time of Sessions within a workflow, are you asking, for example, to calculate how long WF1.S1 takes on average whenever WF1 runs?
Yes, this is the correct interpretation.

0 Karma

snehasal
Explorer

However, I need to calculate the average of WorkFlow and Session at Day level. So when I plot it - x axis should be time of event, and Y axis will have average Duration.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

This code calculates the average duration for each specific step of each workflow across time, so step S3 of workflow WF7 might have an average of 41 minutes.

If the session names are distinct, so that session name S3 means the same thing for every workflow that it is part of , then that would require a change, and calculating the average for S3 across each particular day might make sense.

So, is S1 unique or is WF1.S1 unique?

Oh, in terms of aggregate, that's fine anyway. That could be something like this, which replaces 23 on. It's not statistically accurate, but its a decent first start. Really, you should create a summary index and calculate from there.

| rename COMMENT as "Now calculate the daily averages. "

| bin _time span=1d
| eval type=if(SessionName="WorkFlow","WorkFlow","Session")
| stats avg(WfDuration) as avgWfDuration, avg(SessDuration) as avgSessDuration by type _time
| eventstats max(_time) as maxtime
| eval series=if(_time=maxtime,"today","past")
| eventstats avg(avgWfDuration) as avgavgWfDuration, avg(avgSessDuration) as avgavgSessDuration by series type

You might use trendline in there instead, I'd have to give it some consideration.

0 Karma

snehasal
Explorer

In my case, WF1.S1 is unique. If we eliminate appending WF1 to session name, then (WorkFLowName, Session Name) is unique. How would summary index help me in calculating averages at a daily level? I feel summary index would help in calculating the cumulative averages.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Yes, creating a summary index for this will speed the calculations. Putting the daily averages into a summary index means, basically, that your search doesn't have to do any calculations to present the history of any particular workflow or session.

You said that 80% of your jobs run only once a day. So, however long it took each day IS the average for that day, so there's no savings there.

Only the stuff that runs more than once a day can have a per-day average that is nontrivial,

I would probably leave the workflow appended, and have the WF_Start be WF1.Workflow, but that's a personal choice.

0 Karma

snehasal
Explorer

Hi, I will stick with the idea of Summary Index. Its a good idea to save time and calculations.

Just curious, but why do you feel, leaving the Workflow appended would be good?

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Hmmm. I can't swear it's the right design choice, but it's where my gut says to go. If I had to say, a major point is that it makes the sessionid unique by itself, which has coding, performance and reporting advantages.

You could even have the workflow name be a virtual/calculated field (splunkwise) pulling the data from the session id.

However, you should weigh your decision based upon figuring out how you want to present your reports. If you will always want to pull the data back OUT, then go ahead and do so at the design phase.

Once you build your summary index, it is a reasonable assumption that NOTHING you are likely to do here will have much impact on the overall performance. You can always go back and redo the design, splunk is pretty forgiving that way, you just lose the run time for generating the new version of the summary index.

0 Karma
Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...