Solved: Re: How to filter data in a search using the AND c...

athorat · ‎02-18-2016

We are using this search to find out the Hadoop jobs which have finished.
Each finished job has a start event and an end event.
Start Event : ":start:] with user-retry state"
END Event : "@end***]Action updated in DB!"

The problem with the below search is we are using an OR condition. How do we filter data only when both conditions match?

index=aap_prod sourcetype="HDP:PROD:OOZIE"  (":start:] with user-retry state" OR "@end***]Action updated in DB!") | rex "TOKEN\[\] APP\[(?[^\]]*)"|   rex "ACTION\[[^\@]*(?[^\d\]]*)" | rex "JOB\[?(?[\d-]+)-"|rename JobId as "Job ID",JobName as "Job Name"  | sort _time | streamstats current=t  window=2 range(_time) as JobRunTime latest(_time) as EndTime earliest(_time) as StartTime |sort -_time | table "Job Name", "Job ID",,StartTime , EndTime  ,JobRunTime|dedup "Job ID"| eval StartTime=strftime(StartTime, "%c")|eval EndTime=strftime(EndTime, "%c")|eval JobRunTime=tostring(JobRunTime, "duration")  |search JobRunTime!=0

Richfez · ‎02-18-2016

It sounds like start and end events are in different events, so you could replace a lot of what you are doing with a transaction. One requirement of this is that you need an identifier to "join them up on", like a JobID. I get hints in your search that this is the case, so I'll use JobID for that for now. I'm also assuming that there's a maximum of 15 minutes between a job start and a job end. Adjust as required.

I'm going to write this a couple of lines at a time so I can explain, because there's some other fixups and changes I made. PLUS, you can run this and add one piece at a time to see the affect it has, in fact, I very heartily recommend it.

index=aap_prod sourcetype="HDP:PROD:OOZIE" 
| transaction JobID startswith=":start:] with user-retry state"  endswith="@end***]Action updated in DB!" maxspan=15m
| search duration!=0
| put your rex for TOKEN, ACTION and JOB here | then your rename of Job ID and Job Name

First is your base search which pulls out ALL the events involved with this. We can tweak if necessary, but there's no need to be too restrictive.

Then comes the transaction. Read up on it - its very useful. It will "connect together" into a single event all the individual events that should go together. You for free get a field called duration field to replace all your JobRunTime and stuff. I don't think you'll need to escape any of the characters in there, but we may find differently. We will see.

Third is a search to find only transactions with a duration greater than zero (replacing your end-of-search JobRunTime!=0).

Last is the rexes you need for those fields and the two renames. (A side note - these events appear to have some absolute crazy formatting and puncuatation in them given the startswith and endswith stuff, so I won't comment on the possible oddity of the rexes - just paste 'em in if they work and go on with life - they'll still work if they did before!).

Now, continuing along, here's some things you for sure do NOT need:

| sort _time 
| streamstats current=t  window=2 range(_time) as JobRunTime latest(_time) as EndTime earliest(_time) as StartTime 
| sort -_time

Those aren't needed because they were used before to calculate the job run time, which you already have as "duration."

Now, the start and ending time... we need to keep a bit of this, but I'm going to change it some. You may not need this at all, in which case I would suggest to not add it.

| stats earliest(_time) as StartTime, latest(_time) as EndTime
| eval StartTime=strftime(StartTime, "%c")|eval EndTime=strftime(EndTime, "%c")
| eval JobRunTime=tostring(duration, "duration")

I used stats to get your earliest and latest times for each event, then the rest is probably mostly OK if you need to do that.

So all together:

 index=aap_prod sourcetype="HDP:PROD:OOZIE" 
 | transaction JobID startswith=":start:] with user-retry state"  endswith="@end***]Action updated in DB!" maxspan=15m
 | search duration!=0
 | put your rex for TOKEN, ACTION and JOB here | then your rename of Job ID and Job Name
 | stats earliest(_time) as StartTime, latest(_time) as EndTime
 | eval StartTime=strftime(StartTime, "%c")|eval EndTime=strftime(EndTime, "%c")
 | eval JobRunTime=tostring(duration, "duration")

Now, just as a note there was a bit to fix up in there, so hopefully it all works, but if not please take a little time and build it one piece at a time, seeing what works as I describe it and what doesn't, then ask about the parts that don't work. We can work through this, it just might take a little bit of back and forth.

ALSO, it might be useful if this doesn't work perfectly (or with trivial changes) to paste in an event or two - sometimes seeing the raw data can help immensely. But hopefully it'll pretty much work.

View solution in original post

Richfez · ‎02-18-2016

It sounds like start and end events are in different events, so you could replace a lot of what you are doing with a transaction. One requirement of this is that you need an identifier to "join them up on", like a JobID. I get hints in your search that this is the case, so I'll use JobID for that for now. I'm also assuming that there's a maximum of 15 minutes between a job start and a job end. Adjust as required.

I'm going to write this a couple of lines at a time so I can explain, because there's some other fixups and changes I made. PLUS, you can run this and add one piece at a time to see the affect it has, in fact, I very heartily recommend it.

index=aap_prod sourcetype="HDP:PROD:OOZIE" 
| transaction JobID startswith=":start:] with user-retry state"  endswith="@end***]Action updated in DB!" maxspan=15m
| search duration!=0
| put your rex for TOKEN, ACTION and JOB here | then your rename of Job ID and Job Name

First is your base search which pulls out ALL the events involved with this. We can tweak if necessary, but there's no need to be too restrictive.

Then comes the transaction. Read up on it - its very useful. It will "connect together" into a single event all the individual events that should go together. You for free get a field called duration field to replace all your JobRunTime and stuff. I don't think you'll need to escape any of the characters in there, but we may find differently. We will see.

Third is a search to find only transactions with a duration greater than zero (replacing your end-of-search JobRunTime!=0).

Last is the rexes you need for those fields and the two renames. (A side note - these events appear to have some absolute crazy formatting and puncuatation in them given the startswith and endswith stuff, so I won't comment on the possible oddity of the rexes - just paste 'em in if they work and go on with life - they'll still work if they did before!).

Now, continuing along, here's some things you for sure do NOT need:

| sort _time 
| streamstats current=t  window=2 range(_time) as JobRunTime latest(_time) as EndTime earliest(_time) as StartTime 
| sort -_time

Those aren't needed because they were used before to calculate the job run time, which you already have as "duration."

Now, the start and ending time... we need to keep a bit of this, but I'm going to change it some. You may not need this at all, in which case I would suggest to not add it.

| stats earliest(_time) as StartTime, latest(_time) as EndTime
| eval StartTime=strftime(StartTime, "%c")|eval EndTime=strftime(EndTime, "%c")
| eval JobRunTime=tostring(duration, "duration")

I used stats to get your earliest and latest times for each event, then the rest is probably mostly OK if you need to do that.

So all together:

 index=aap_prod sourcetype="HDP:PROD:OOZIE" 
 | transaction JobID startswith=":start:] with user-retry state"  endswith="@end***]Action updated in DB!" maxspan=15m
 | search duration!=0
 | put your rex for TOKEN, ACTION and JOB here | then your rename of Job ID and Job Name
 | stats earliest(_time) as StartTime, latest(_time) as EndTime
 | eval StartTime=strftime(StartTime, "%c")|eval EndTime=strftime(EndTime, "%c")
 | eval JobRunTime=tostring(duration, "duration")

Now, just as a note there was a bit to fix up in there, so hopefully it all works, but if not please take a little time and build it one piece at a time, seeing what works as I describe it and what doesn't, then ask about the parts that don't work. We can work through this, it just might take a little bit of back and forth.

ALSO, it might be useful if this doesn't work perfectly (or with trivial changes) to paste in an event or two - sometimes seeing the raw data can help immensely. But hopefully it'll pretty much work.

athorat · ‎02-18-2016

@rich7177

Thank you so much for this awesome reply.

here is what is happening. I tried to use your part

  index=aap_prod sourcetype="HDP:PROD:OOZIE" 
          | transaction JobID startswith=":start:] with user-retry state"  endswith="@end***]Action updated in DB!" 
          | search duration!=0
         | rex "TOKEN\[\] APP\[(?<JobName>[^\]]*)"|   rex "ACTION\[[^\@]*(?<Action>[^\d\]]*)" 
         | rex "JOB\[?(?<JobId>[\d-]+)-"|rename JobId as "Job ID",JobName as "Job Name"
        | table "Job Name", "Job ID",StartTime , EndTime  ,duration
    ************above part works except the start and end time*******************

if I add below mentioned part it does not return any data in the table

| stats earliest(_time) as StartTime, latest(_time) as EndTime
| eval StartTime=strftime(StartTime, "%c")|eval EndTime=strftime(EndTime, "%c")
| eval JobRunTime=tostring(duration, "duration")

The previous rex works fine in this one. just that the start and end times are not displayed in the table.

 Events :
*********************************


2/18/16 
7:11:46.526 PM  
2016-02-18 19:11:46,526  INFO ActionStartXCommand:543 - SERVER[p01bdl841.aap.csaa.com] USER[hdfs] GROUP[-] TOKEN[] APP[WF_danlaw_journey_hive] JOB[0001550-160210171129236-oozie-oozi-W] ACTION[0001550-160210171129236-oozie-oozi-W@end] [***0001550-160210171129236-oozie-oozi-W@end***]Action updated in DB!

2/18/16 
7:05:01.882 PM  
2016-02-18 19:05:01,882  INFO ActionStartXCommand:543 - SERVER[p01bdl841.aap.csaa.com] USER[hdfs] GROUP[-] TOKEN[] APP[WF_danlaw_journey_hive] JOB[0001550-160210171129236-oozie-oozi-W] ACTION[0001550-160210171129236-oozie-oozi-W@:start:] Start action [0001550-160210171129236-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
*********************************

Thanks a ton for looking into this.

Richfez · ‎02-18-2016

Oh, oops, I just realized you rex the JobID. In order to connect those two events together properly on JobID, well, JobID needs to be defined BEFORE you put that in the transaction. To do that, you'll want to move your rex for that to somewhere before the transaction so you can use it IN the transaction command. Like this

index=aap_prod sourcetype="HDP:PROD:OOZIE" 
| rex "JOB\[?(?<JobId>[\d-]+)-"
| transaction JobID startswith=":start:] with user-retry state"  endswith="@end***]Action updated in DB!"

This should get you ONE event with both those previous events in it, "grouping" them together. If it works you'll see what I mean.

So, let's get this working first, then the rest will be pretty easy because it's just additional rexes and evals and making it pretty.

athorat · ‎02-18-2016

@rich7177

It was grouping two events in one even with the rex written after the transaction.
is that what you were referring to.. if thats the one , its grouping start and end events as one event.

Richfez · ‎02-19-2016

Really! Interesting.

1) So, if you run just index=aap_prod sourcetype="HDP:PROD:OOZIE" do you get fields like JobId and all that? (Might have to change to "Verbose mode" instead of "Fast" or "Smart" mode, in the upper right).

2) If you run the index=aap_prod sourcetype="HDP:PROD:OOZIE" | transaction ... one, it group them correctly? If so, check for duration.

My critical thinking skills were lacking last night, not sure why... But if you have duration (you should!), then _time of the entire transaction was the start time, duration is your duration and all you need is end time. To make start time, end time all pretty - eval starttime=_time | eval endtime=starttime+duration

Let me know where that works and where that doesn't.

athorat · ‎02-19-2016

@rich7177

Thanks for the reply.
To answer your questions:

1) So, if you run just index=aap_prod sourcetype="HDP:PROD:OOZIE" do you get fields like JobId and all that? (Might have to change to "Verbose mode" instead of "Fast" or "Smart" mode, in the upper right). : Yes it does give a JobID i verbose mode and seems was clubbing events based on that.

but the weird part is its clubbing two events with different ID.

2/19/16

2016-02-19 12:05:02,482 INFO ActionStartXCommand:543 - SERVER[p01bdl841.com] USER[hdfs] GROUP[-] TOKEN[] APP[WF_Pig_SIS_RAW_INGEST_ES] JOB[0001685-160210171129236-oozie-oozi-W] ACTION[0001685-160210171129236-oozie-oozi-W@:start:] Start action [0001685-160210171129236-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2016-02-19 12:09:06,932 INFO ActionStartXCommand:543 - SERVER[p01bdl841.com] USER[hdfs] GROUP[-] TOKEN[] APP[WF_CASggflume] JOB[0001658-160210171129236-oozie-oozi-W] ACTION[0001658-160210171129236-oozie-oozi-W@end] [0001658-160210171129236-oozie-oozi-W@end]Action updated in DB!

2) If you run the index=aap_prod sourcetype="HDP:PROD:OOZIE" | transaction ... one, it group them correctly? If so, check for duration.,: Seems have an issue grouping them correctly.
But when i use the | transaction "JOB ID" which is the "JOB ID" extracted from the custom rex, what it does is apart from startwith and endswith events, it also groups the processes associated for that JOB ID(all the steps/processes which ran between the start event and the end event)

The count is correct though.

Thanks.

athorat · ‎02-22-2016

@rich7177

I think I got the problem and here is what was happening.

I was using using the renamed, names for "JOB ID" and was causing an issue with JOB ID not matching in the two grouped events.

Using the following query below , this works but some how the start and the end time is still an issue.
I cant get them right in this query, is it because the two events are grouped and its not able to get those right?

index=aap_prod sourcetype="HDP:PROD:OOZIE"  (":start:] with user-retry state" OR "@end***]Action updated in DB!") | rex "TOKEN\[\] APP\[(?<JobName>[^\]]*)"|   rex "ACTION\[[^\@]*(?<Action>[^\d\]]*)" | rex "JOB\[?(?<JobId>[\d-]+)-"
| transaction JobId startswith=":start:] with user-retry state"  endswith="@end***]Action updated in DB!"
|table JobName,JobId, duration

Thanks.

Richfez · ‎02-22-2016

You'll have to calculate the StartTime and EndTime. The duration is working correctly, right? That means the core pieces are in place, as you mention.

index=aap_prod sourcetype="HDP:PROD:OOZIE"  
(":start:] with user-retry state" OR "@end***]Action updated in DB!") 
| rex "TOKEN\[\] APP\[(?<JobName>[^\]]*)"
| rex "ACTION\[[^\@]*(?<Action>[^\d\]]*)" 
| rex "JOB\[?(?<JobId>[\d-]+)-"
| transaction JobId startswith=":start:] with user-retry state"  
endswith="@end***]Action updated in DB!"
| eval StartTime=_time 
| eval EndTime=StartTime+duration
| eval StartTime=strftime(StartTime, "%c")
| eval EndTime=strftime(EndTime, "%c")
| table JobName,JobId, StartTime, EndTime, duration

I added four evals - one to make the StartTime, one for EndTime, then two two using strftime to make them pretty. See what that gets you.

If you want duration formatted more nicely add | eval duration=tostring(duration, "duration") in there anywhere between the transaction and the table.

athorat · ‎02-22-2016

@rich7177

Thank You so much for your help,

athorat · ‎02-25-2016

@rich7177 one question here
is there a way to restrict this search with upper case and lower case scenarios.

 index=aap_prod sourcetype="HDP:PROD:OOZIE"  
 (":start:] with user-retry state" OR "@end***]Action updated in DB!")

when I use this @end*** it also returns results with END_IMPORT***, which is an event detail for failed jobs.

https://answers.splunk.com/answers/372033/filter-data-with-upper-case-and-lower-case-scenari.html?mi...

Richfez · ‎02-25-2016

Yes, use CASE() (or FAR less likely TERM() ). You might have to play around with them a little, but they're pretty straightforward.

They're documented (sort of ) in the Search command primer.

How to filter data in a search using the AND condition?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!