Hi all!
I have something which sends me the START and the STOP of some processes.
I have this search that creates a timeline chart and, if in some processes we have just the START event, assumes that this process must be GOING so it attributes to that process endtime = now (just for the sake of drawing the bar) .
MY BASE SEARCH
| eval StartTime = if(LogType = "START", _time, null)
| eval EndTime = if(LogType = "END", _time, null)
| stats Latest(StartTime) as StartTime Latest(EndTime) as EndTime Latest(Result) as Result by process id
| eval EndTime = if(isnull(EndTime), now(), EndTime)
| eval Status = if(isnull(Result), "GOING", Result)
| eval duration = (EndTime-StartTime) * 1000
| where (duration != "" OR duration >= 0)
| table StartTime process Status duration
| sort StartTime
Anyway sometimes this process has been killed and does not send the STOP event.
The consequence of this is that a process that has been killed continues to draw this bar and if the same process started again, we have overlapping, like in the image:
What I would like to do is to exclude those instance of the process which have just a start without having an end but only if another instance of the process is already started.
(Instances of the process are identified in the search by the field "id")
Example:
Event1 -> processA id1 Start
Event2 -> processA id2 Start
Event3 -> processA id2 end
In the previous example, with those events a chart with overlapping (like the image) would come out.
So I would like the search to exclude the First start because there is a second start and no end, which means that the process was killed.
I hope I was clear in my explanation.
Please don't hesitate to ask me any further details.
Thank you
I found the solution myself.
This is the code:
MY BASE SEARCH
| eval StartTime = if(LogType = "START", _time, null)
| eval EndTime = if(LogType = "END", _time, null)
| stats Latest(StartTime) as StartTime Latest(EndTime) as EndTime Latest(Result) as Result by precess id
| join type=left precess [search
MY BASE SEARCH LogType ="START"
| stats latest(_time) AS LatestStartTime by precess]
| eval EndTime = if(isnull(EndTime), if(StartTime = LatestStartTime, now(), null() ), EndTime)
| eval Status = if(isnull(Result), "GOING", Result)
| eval duration = (EndTime-StartTime) * 1000
| where duration >= 0
| table StartTime precess Status duration
| sort StartTime
Thanks to all the people who answered!
:)
I found the solution myself.
This is the code:
MY BASE SEARCH
| eval StartTime = if(LogType = "START", _time, null)
| eval EndTime = if(LogType = "END", _time, null)
| stats Latest(StartTime) as StartTime Latest(EndTime) as EndTime Latest(Result) as Result by precess id
| join type=left precess [search
MY BASE SEARCH LogType ="START"
| stats latest(_time) AS LatestStartTime by precess]
| eval EndTime = if(isnull(EndTime), if(StartTime = LatestStartTime, now(), null() ), EndTime)
| eval Status = if(isnull(Result), "GOING", Result)
| eval duration = (EndTime-StartTime) * 1000
| where duration >= 0
| table StartTime precess Status duration
| sort StartTime
Thanks to all the people who answered!
:)
You basically had it, I think. I tweaked a few tiny typos and optimizations but it is essentially your search:
MY BASE SEARCH
| eval StartTime = if(LogType = "START", _time, null())
| eval EndTime = if(LogType = "END", _time, null())
| stats latest(StartTime) AS StartTime latest(EndTime) AS EndTime latest(Result) AS Result BY host process id
| eval EndTime = if(isnull(EndTime), now(), EndTime)
| eval Status = if(isnull(Result), "GOING", Result)
| eval duration = (EndTime-StartTime) * 1000
| search duration >= 0
| table StartTime process Status duration
| sort 0 StartTime
It still creates overlapping 😞
What is the difference between | search duration >= 0
and |where duration >= 0
?
What is the difference between | sort 0 StartTime
and | sort StartTime
?
Best practices: I use where
when there is a field name on the RHS and seaerch
when there is a literal. You should get in the habit of using sort 0
always; otherwise it is limited to 1000 results!
What is RHS and what is literal?
So 0 menas no limit in the sort?
RHS is Right-Hand-Side. Use where
if there is a field name on both sides of the comparison operator; Use search
if there is a field name on the LHS and a non-field-name (a literal
value) on the RHS. For sort
using sort 0
means unlimited.
And using search instead of where in that case enhance the performance?
No, this is for clarity and to avoid bugs. If you see my code using a single search
test you know that the RHS is a literal and if you see where
, you know that I am trying to have the RHS be a field name. The problem is sometimes a RHS thing (if it is written ambiguously) is intended to be a field name but that field name doesn't exists, and in such cases it will be treated as a literal. This fallback can cause unintended results.
Give this a try. It works best if you don't have too much data to be processed. It's basically taking both type of events separately. Doing so you're able to dedup on START event of a process to keep the last start.
index=yourindex sourcetype=yoursourcetype logType="START"
| dedup process | eval StartTime=_time
| append[search index=yourindex sourcetype=yoursourcetype logType="END" | eval EndTime=_time]
| sort 0 -_time
| stats Latest(StartTime) as StartTime Latest(EndTime) as EndTime Latest(Result) as Result by process id
| eval EndTime = if(isnull(EndTime), now(), EndTime)
| eval Status = if(isnull(Result), "GOING", Result)
| eval duration = (EndTime-StartTime) * 1000
| where (duration != "" OR duration >= 0)
| table StartTime process Status duration
| sort StartTime
Does not work, it keeps overlapping.
What is the timerange for which you run this query??
Could you also provide your base search to under the conditions to identify START AND STOP events.
the base query only contains an index and a source.
The events are jsons with fisically written, between the other things, "START" or "END".
Time range??? This is important as if it's a daily process, and time range is 2 you'll have 2 instance of that process running.
Its possible to have 2 instances running in the same day but never together, so if there is overlapping the fist one is been killed.
Hi andreafebbo,
Seems the transaction command is a very good candidate for your use case. Please try the following example search, assuming processId is the name of the field based on which to pair up your events:
... | transaction processId startswith="Start" endswith="End" maxevents=2 | where duration > 0
For detailed information about the transaction command, please refer to documentation:
http://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Transaction
Hope this helps. Thanks!
Hunter
I read the documentation and I think it might help me but is not easy to understand deeply how it works.
Could you please try to integrate it in my code so I can understand it better?
Thanks