Need help to visualize the query

pench2k19 · ‎03-04-2020

Hi Guys,

I have the following query and query result, i am struggling to show it in graph:

index=infra_apps sourcetype=ca:atsys:edemon:txt
| search Job=*
| rename hostname as host 
| eval time_epoch=strftime(_time,"%Y-%m-%d %H:%M:%S") 
| fields Job host Autosysjob_time Status _time time_epoch 
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| table Job Status host cluster _time time_epoch
| search cluster=* AND host=*
| sort + time_epoch
| stats count  by _time Job Status host cluster time_epoch
| bin span=2m time_epoch
| makecontinuous  _time span=2m
| filldown _time Job Status host cluster count time_epoch

Query result:

_time   Job Status  host    cluster time_epoch  count
3/3/2020 8:00   1CDH_ING_NBC_ACCT_MSTR_DY_CURR_HG                       STARTING    XXXX    edl-prd-m01 43893.33337 1
3/3/2020 8:00   1CDH_ING_NBC_ACCT_OB_PRIM_CK_DY_TMPRL_BMG     STARTING  XXXX    edl-prd-m01 43893.33338 1
3/3/2020 8:00   1CDH_ING_NBC_EVNT_CUST_ID_CHG_HY_HIST_CIS        RUNNING    XXXXX   edl-prd-m01 43893.33372 1
3/3/2020 8:00   1CDH_ING_NBC_EVNT_CUST_PH_CHG_HY_HIST_CIS        RUNNING    XXXX    edl-prd-m01 43893.33372 1
3/3/2020 9:00   1CDH_ING_NBC_EVNT_CUST_PH_CHG_HY_HIST_CIS        RUNNING    XXXX    edl-prd-m01 43893.33372 1

Now i am struggling to show how many jobs are running or starting at each minute , can you please help

woodcock · ‎03-14-2020

You should take a look at the concurrency command and the answers.splunk.com articles about it.

sideview · ‎03-09-2020

I think the core problem, or at the very least a major source of confusion, is that you're taking _time, which is already an integer value (the number of seconds since big ben rang out the new year in 1970), and turning it into a string-formatted time, naming that field "time_epoch" which is SUPER confusing because its values are string-formatted times, not epochtime integers.

Then sorting by that string formatted time. This is a little odd but so far not really a problem - just weird since the original _time value can do all that perfectly well too.

Wild guess - you're getting confused by the behavior of the Splunk UI - whenever it sees any field called "_time" it sneakily sneaks in and on-the-fly converts the display of those values to string-formatted times. However the values underneath are epochtime integers. you can always verify that I'm not talking crazy by doing | eval srsly_wat=_time

Then anyway, back to our story the problem arises with this -
| bin span=2m time_epoch

that would make sense if time_epoch was actually an epochtime-valued field, ie if it held integer values. It makes no sense however in this case where the "time_epoch" field holds strings. Bin has no idea what to do with this combination of field and value, and so what it does is -- nothing at all. it throws no error but it does nothing.

I think if you fix that core issue, you might be able to revisit what you're trying to do with the bin+makecontinuous +filldown. re-examining that fresh, with some fresh caffeine, is a good idea.

 index=infra_apps sourcetype=ca:atsys:edemon:txt
 | search Job=*
 | rename hostname as host 
 | fields Job host Autosysjob_time Status _time 
 | lookup datalakenodeslist.csv host OUTPUT cluster
 | mvexpand cluster
 | table Job Status host cluster _time 
 | search cluster=* AND host=*
 | sort + _time
 | stats count  by _time Job Status host cluster

This is bit of an instinct, but I have seen many cases where people where folks are fiddling with bin+makecontinuous+filldown basically trying to recreate one of timechart's core use cases, beacuse they've come to believe that timechart can't do it.

I hope this helps - I don't think it's your whole answer, but I think it clears some of the mess up and puts you in a better place. and i'll update this answer if you post back etc.

pench2k19 · ‎03-10-2020

I just got a query derived for no of running jobs at each time as follows

index=XXX sourcetype=YYYY EventCode=40 AND (Status=Failure OR Status=Terminated OR Status=Running OR Status=Success) AppID=$appid$ Machine=$host$ Job=$job$
| dedup _raw
| lookup datalakenodeslist.csv host as Machine OUTPUT cluster
| search cluster=$clustername$
| eval running=if(Status="RUNNING","1","0"),status=if(Status="RUNNING","start","stop"), time=_time+status
| bin span=2m _time
| stats max(running) as running, earliest(time) as first, latest(time) as last by Job,_time
| xyseries _time Job running first last
| makecontinuous span=2m _time
| streamstats window=2 global=f earliest(last*) as last*
| reverse
| streamstats window=2 global=f earliest(first*) as first*
| reverse
| foreach running*
[ eval <>=if(isnull('<>') AND like('first<>',"%start"),"0",if(isnull('<>') AND like('first<>',"%stop"),"1",if(isnull('<>') AND like('last<>',"%start"),"1",if(isnull('<>') AND like('last<>',"%stop"),"0",'<>'))))]
| fields - first*, last*
| filldown *
| reverse
| filldown *
| reverse
| addtotals fieldname=Running
| fields _time,Running

But i find it too difficult to just understand the query :(, and not able to edit this query to have single query for both RUNNING and STARTING statuses.

Can you please help to have a single query for both the statuses.

pench2k19 · ‎03-10-2020

@sideview thanks for such detailed explanation.

Sorting the resutls before stats command with _time was not working for me , thats why i had to use time_epoch.

And i tried using | bin span=2m time_epoch and | bin span=2m _time but these are giving me different results at times , i still confused what to use on this line.

P S: Example of how _time being extracted in this case is 2020-03-09 10:00:03.863

pench2k19 · ‎03-05-2020

@richgalloway this is not helping....

I just want to give you some back ground about this..I have a single event for each job status in splunk

But to make it continuous i am inserting 2 mins interval using bin and makecontinous command and filling those intervals with the last status it updated,

For example if its running status at 9 am and success status at 10 am, i am creating 2m interval between that 1 hour gap and filling it with Running status.

I suspect that is one of the reason i am not able to get that in graph as i am expexting.

PS: I am getting the correct result till the filldown command, after that i am struggling.

pench2k19 · ‎03-05-2020

index=infra_apps sourcetype=ca:atsys:edemon:txt
| search Job=*
| rename hostname as host 
| eval time_epoch=strftime(_time,"%Y-%m-%d %H:%M:%S")
| fields Job host Autosysjob_time Status _time time_epoch
| lookup datalakenodeslist.csv host OUTPUT cluster 
| mvexpand cluster
| table Job Status host cluster _time time_epoch
| search cluster=EDL-PRD-M01 AND host=*
| sort + time_epoch
| stats count  by _time Job Status host cluster time_epoch
| bin span=2m _time
| makecontinuous span=2m _time
| filldown _time Job Status host cluster count time_epoch
| stats count by _time Status

this is giving me the correct result, but i am not able have a filter in the query like search Status=RUNNING for example, When i add the status filter its giving me different result.

@woodcock @jka can you please help.

pench2k19 · ‎03-09-2020

@woodcock @jkat54 @somesoni2 @sideview can you please help

richgalloway · ‎03-04-2020

Try adding | timechart span=1m count by Status to the end of your query.

---
If this reply helps you, Karma would be appreciated.

Need help to visualize the query

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

Improve Data Pipelines Using Splunk Data Management

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?