index=aap_prod sourcetype="HDP:PROD:OOZIE" | rex "TOKEN\[\] APP\[(?<JobName>[^\]]*)" | rex "ACTION\[[^\@]*(?<Action>[^\d\]]*)"
Extracting two fields JobName and Action.
The field called Action has values 1>@:start: 2> @end and 3> @Incremental
I have to calculate total number of JobNames which have values start and end which will be considered as successful jobs, and the count of JobNames which have different Start and End.
try this:
index=aap_prod sourcetype="HDP:PROD:OOZIE"
| rex "APP\[(?<Job_name>[^\]]+)\]\s*JOB\[(?<job_id>[^\]]+)\]\s*ACTION\[[^:]+:(?<action>[^:]+)"
| fields action job_id
| stats values(action) AS action by job_id
| where mvfilter(match(action,"start") AND match(action,"end"))
try this:
index=aap_prod sourcetype="HDP:PROD:OOZIE"
| rex "APP\[(?<Job_name>[^\]]+)\]\s*JOB\[(?<job_id>[^\]]+)\]\s*ACTION\[[^:]+:(?<action>[^:]+)"
| fields action job_id
| stats values(action) AS action by job_id
| where mvfilter(match(action,"start") AND match(action,"end"))
I modified the Field Names but that works (Y).
Thank You so much.
index=aap_prod sourcetype="HDP:PROD:OOZIE" ("@:start:" OR "@end") | rex "TOKEN\[\] APP\[(?<JobName>[^\]]*)" | rex "ACTION\[[^\@]*(?<Action>[^\d\]]*)"
| fields Action JobName
| stats values(Action) AS Action by JobName
| where isnotnull(mvfind(Action,"@:start:")) AND isnotnull(mvfind(Action,"@end"))
@dcarmack one followup question
When I do a stats count of @:start: and @end I get a count of 360+ for both of them
index=aap_prod sourcetype="HDP:PROD:OOZIE" | rex "TOKEN\[\] APP\[(?<JobName>[^\]]*)" | rex "ACTION\[[^\@]*(?<Action>[^\d\]]*)" | search Action="@:start:"| stats count(Action)
but when I use the query which you suggested it returns around 118.
also when I try to use stats values(JobName) and group by Action it does not return anything...
The search I provided you is giving a count of job_id's where the action of both start and end exists. A stats count is simply going to give you a count of events where either exist.
... | stats values(JobName) AS JobName by Action
^^ that doesn't return anything?
@dcarmack thanks for the reply
I get an error : Error in 'where' command: The expression is invalid. The result of a 'where' expression must be boolean.
oops! sorry! replace the where with this
| where isnotnull(mvfind(action,"start")) AND isnotnull(mvfind(action,"end"))
"I have to calculate total number of job names which has values start and end , which will be considered as successful jobs
and the count of jobnames which has difference of Start and End"
If you break each of these events into separate events correctly:
Event 1: 2015-12-17 23:25:17,487 INFO ActionStartXCommand:543 - SERVER[p01bdl841.aap.csaa.com] USER[hdfs] GROUP[-] TOKEN[] APP[WF_danlaw_journey_hive] JOB[0015832-150810162217214-oozie-oozi-W] ACTION[0015832-150810162217214-oozie-oozi-W@:start:] [***0015832-150810162217214-oozie-oozi-W@:start:***]Action updated in DB!
Event 2: 2015-12-18 23:25:17,487 INFO ActionStartXCommand:543 - SERVER[p01bdl841.aap.csaa.com] USER[hdfs] GROUP[-] TOKEN[] APP[WF_danlaw_journey_hive] JOB[0015832-150810162217214-oozie-oozi-W] ACTION[0015832-150810162217214-oozie-oozi-W@:start:] [***0015832-150810162217214-oozie-oozi-W@:start:***]Action updated in DB!
Then it makes sense that each event is 1 job name. So all you need to do is search the events looking for start & stop.
index=aap_prod sourcetype="HDP:PROD:OOZIE" start stop | rex field=_raw "(?<app>(?<=APP\[).*(?=\] JOB))" | rex field=_raw "(?<job>(?<=JOB\[).*(?=\] ACTION))" | stats c by app job
and
index=aap_prod sourcetype="HDP:PROD:OOZIE" start NOT stop | rex field=_raw "(?<app>(?<=APP\[).*(?=\] JOB))" | rex field=_raw "(?<job>(?<=JOB\[).*(?=\] ACTION))" | stats c by app job
I think I was not clear in explaining the requirement.
This is my regex
index=aap_prod sourcetype="HDP:PROD:OOZIE" ("@:start:" OR "@end") | rex "TOKEN\[\] APP\[(?[^\]]*)" | rex "ACTION\[[^\@]*(?[^\d\]]*)" | stats count(JobName) by Action
And I want to calculate the JobNames which has Action-Start AND Action=END
The regex you have posted here doesn't appear to be correct. Are you getting the right extraction? Or are your needing help with that as well? If you need help with the regex, you should include some sample data so someone can help with valid regex. Also, what do you mean by difference of start and end? Do mean number of events that have an Action=Start by do not have Action=End?
This is the event
2015-12-17 23:25:17,487 INFO ActionStartXCommand:543 - SERVER[p01bdl841.aap.csaa.com] USER[hdfs] GROUP[-] TOKEN[] APP[WF_danlaw_journey_hive] JOB[0015832-150810162217214-oozie-oozi-W] ACTION[0015832-150810162217214-oozie-oozi-W@:start:] [***0015832-150810162217214-oozie-oozi-W@:start:***]Action updated in DB!
I get the Start Action, End Action and the Incremantal Action
So just need to get the jobnames which has action as started and End.
I'm still a bit unclear as to what it is that you are looking for. I have modified your regex where it will not correctly extract JobName and Action. However, start appears twice in your data, which one is the action? Looking at your regex, it appears you attempting to get the one that starts with ACTION. Based on these assumption, here's the rex for you to try. Hopefully, this is what you were looking for
.... | rex "APP\[(?<JobName>[^\]]+)" | rex "ACTION[^:]+:(?<Action>[^:]+)"
athorat,
Once we make sure those extractions and rexes are correct, is what you want something like
A list of successful JobNames where the "start" and "end" count per JobName are equal, and another list of unsuccessful JobNames where the count of start and end don't match. Does that sound right?
So, you might have as successful ones,
JobName, Count of Start, Count of End
BillyBob, 92, 92
JimmyJohn, 43, 43
...
Unsuccessful could be things like
JobName, Count of Start, Count of End
RalphyRalph, 47, 42
MyBrokenJob, 52, 18
MyImpossibleJob, 32, 0
(This can be combined into one list probably pretty well, so if it's a separate list or part of the same list isn't really an issue, what I think we need to make sure of is that we're thinking about this the right way for now).