Hi Guys,
I have the following query and query result, i am struggling to show it in graph:
index=infra_apps sourcetype=ca:atsys:edemon:txt
| search Job=*
| rename hostname as host
| eval time_epoch=strftime(_time,"%Y-%m-%d %H:%M:%S")
| fields Job host Autosysjob_time Status _time time_epoch
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| table Job Status host cluster _time time_epoch
| search cluster=* AND host=*
| sort + time_epoch
| stats count by _time Job Status host cluster time_epoch
| bin span=2m time_epoch
| makecontinuous _time span=2m
| filldown _time Job Status host cluster count time_epoch
Query result:
_time Job Status host cluster time_epoch count
3/3/2020 8:00 1CDH_ING_NBC_ACCT_MSTR_DY_CURR_HG STARTING XXXX edl-prd-m01 43893.33337 1
3/3/2020 8:00 1CDH_ING_NBC_ACCT_OB_PRIM_CK_DY_TMPRL_BMG STARTING XXXX edl-prd-m01 43893.33338 1
3/3/2020 8:00 1CDH_ING_NBC_EVNT_CUST_ID_CHG_HY_HIST_CIS RUNNING XXXXX edl-prd-m01 43893.33372 1
3/3/2020 8:00 1CDH_ING_NBC_EVNT_CUST_PH_CHG_HY_HIST_CIS RUNNING XXXX edl-prd-m01 43893.33372 1
3/3/2020 9:00 1CDH_ING_NBC_EVNT_CUST_PH_CHG_HY_HIST_CIS RUNNING XXXX edl-prd-m01 43893.33372 1
Now i am struggling to show how many jobs are running or starting at each minute , can you please help
You should take a look at the concurrency
command and the answers.splunk.com articles about it.
I think the core problem, or at the very least a major source of confusion, is that you're taking _time, which is already an integer value (the number of seconds since big ben rang out the new year in 1970), and turning it into a string-formatted time, naming that field "time_epoch" which is SUPER confusing because its values are string-formatted times, not epochtime integers.
Then sorting by that string formatted time. This is a little odd but so far not really a problem - just weird since the original _time value can do all that perfectly well too.
Wild guess - you're getting confused by the behavior of the Splunk UI - whenever it sees any field called "_time" it sneakily sneaks in and on-the-fly converts the display of those values to string-formatted times. However the values underneath are epochtime integers. you can always verify that I'm not talking crazy by doing | eval srsly_wat=_time
Then anyway, back to our story the problem arises with this -
| bin span=2m time_epoch
that would make sense if time_epoch was actually an epochtime-valued field, ie if it held integer values. It makes no sense however in this case where the "time_epoch" field holds strings. Bin has no idea what to do with this combination of field and value, and so what it does is -- nothing at all. it throws no error but it does nothing.
I think if you fix that core issue, you might be able to revisit what you're trying to do with the bin+makecontinuous +filldown. re-examining that fresh, with some fresh caffeine, is a good idea.
index=infra_apps sourcetype=ca:atsys:edemon:txt
| search Job=*
| rename hostname as host
| fields Job host Autosysjob_time Status _time
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| table Job Status host cluster _time
| search cluster=* AND host=*
| sort + _time
| stats count by _time Job Status host cluster
This is bit of an instinct, but I have seen many cases where people where folks are fiddling with bin+makecontinuous+filldown basically trying to recreate one of timechart's core use cases, beacuse they've come to believe that timechart can't do it.
I hope this helps - I don't think it's your whole answer, but I think it clears some of the mess up and puts you in a better place. and i'll update this answer if you post back etc.
I just got a query derived for no of running jobs at each time as follows
index=XXX sourcetype=YYYY EventCode=40 AND (Status=Failure OR Status=Terminated OR Status=Running OR Status=Success) AppID=$appid$ Machine=$host$ Job=$job$
| dedup _raw
| lookup datalakenodeslist.csv host as Machine OUTPUT cluster
| search cluster=$clustername$
| eval running=if(Status="RUNNING","1","0"),status=if(Status="RUNNING","start","stop"), time=_time+status
| bin span=2m _time
| stats max(running) as running, earliest(time) as first, latest(time) as last by Job,_time
| xyseries _time Job running first last
| makecontinuous span=2m _time
| streamstats window=2 global=f earliest(last*) as last*
| reverse
| streamstats window=2 global=f earliest(first*) as first*
| reverse
| foreach running*
[ eval <
| fields - first*, last*
| filldown *
| reverse
| filldown *
| reverse
| addtotals fieldname=Running
| fields _time,Running
But i find it too difficult to just understand the query :(, and not able to edit this query to have single query for both RUNNING and STARTING statuses.
Can you please help to have a single query for both the statuses.
@sideview thanks for such detailed explanation.
The following the is the latest query i am using
index=infra_apps sourcetype=ca:atsys:edemon:txt
| search Job=*
| rename hostname as host
| eval t1=strftime(_time,"%Y-%m-%d %H:%M:%S")
| eval time_epoch=strptime(t1,"%Y-%m-%d %H:%M:%S")
| fields Job host Autosysjob_time Status _time time_epoch t1
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| table Job Status host cluster _time time_epoch
| search cluster=EDL-PRD-M01 AND host=*
| sort + time_epoch
| stats count by _time Job Status host cluster time_epoch
| bin span=2m _time
| makecontinuous span=2m _time
| filldown _time Job Status host cluster count time_epoch
Sorting the resutls before stats command with _time was not working for me , thats why i had to use time_epoch.
And i tried using | bin span=2m time_epoch and | bin span=2m _time but these are giving me different results at times , i still confused what to use on this line.
P S: Example of how _time being extracted in this case is 2020-03-09 10:00:03.863
@richgalloway this is not helping....
I just want to give you some back ground about this..I have a single event for each job status in splunk
But to make it continuous i am inserting 2 mins interval using bin and makecontinous command and filling those intervals with the last status it updated,
For example if its running status at 9 am and success status at 10 am, i am creating 2m interval between that 1 hour gap and filling it with Running status.
I suspect that is one of the reason i am not able to get that in graph as i am expexting.
PS: I am getting the correct result till the filldown command, after that i am struggling.
index=infra_apps sourcetype=ca:atsys:edemon:txt
| search Job=*
| rename hostname as host
| eval time_epoch=strftime(_time,"%Y-%m-%d %H:%M:%S")
| fields Job host Autosysjob_time Status _time time_epoch
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| table Job Status host cluster _time time_epoch
| search cluster=EDL-PRD-M01 AND host=*
| sort + time_epoch
| stats count by _time Job Status host cluster time_epoch
| bin span=2m _time
| makecontinuous span=2m _time
| filldown _time Job Status host cluster count time_epoch
| stats count by _time Status
this is giving me the correct result, but i am not able have a filter in the query like search Status=RUNNING for example, When i add the status filter its giving me different result.
@woodcock @jka can you please help.
@woodcock @jkat54 @somesoni2 @sideview can you please help
Try adding | timechart span=1m count by Status
to the end of your query.