Solved: Creating a transposed table in conjunction with st...

byu168168 · ‎09-11-2017

(index=geniachip AND (geniaComplete.flag OR "DVT ready" OR "transfer complete for all banks" OR "lz4.complete*" OR "On-station compression complete.") OR (index=fbu_sizes)
| eventstats values(fbuLZ4Size) AS fbuLZ4Size by run_name 
| eval run_id_date_size = run_id."##".date."@@".fbuLZ4Size
| eval message=if(message LIKE "geniaComplete.flag%", "geniaComplete.flag", message) 
| eval message=if(message LIKE "lz4.complete%", "lz4.complete", message)
| eval start_time=case(message="geniaComplete.flag", timestamp,
message="lz4.complete", timestamp,
message="transfer complete for all banks.", unix_time,
message="DVT ready", unix_time,
message="On-station compression complete.", unix_time)
| chart values(start_time) over run_id_date_size by message
| eval fbuLZ4Size=mvindex(split(run_id_date_size,"@@"),1) 
| eval fbuLZ4SizeGB = fbuLZ4Size/1000000 
| search geniaComplete.flag = *
| eval "Xfer Time" = ('transfer complete for all banks.' - 'On-station compression complete.')/60
| eval "ACAP Time" = ('DVT ready' - 'transfer complete for all banks.')/60
| eval "ubf_compress" = ('lz4.complete' - 'geniaComplete.flag')/60
| eval "Total Time" = ('DVT ready' - 'geniaComplete.flag')/60
| eval count=if('ubf_compress' > 30, 1, 0)
| eval count1=if('Xfer Time' > 60, 1, 0)
| eval count2=if('ACAP Time' > 60, 1, 0)
| eval count3=if('Total Time' > 150, 1, 0)
| stats dc(run_id_date_size) AS "Total Runs", sum(count) AS "ubfcompress", sum(count1) AS "Xfer Time", sum(count2) AS "ACAP Time", sum(count3) AS "Total", median("Xfer Time") AS med_xfer_time, median("ACAP Time") AS med_ACAP_time, median("ubf_compress") AS med_ubf_compress, median("Total Time") AS med_tot_time
| eval "UBFcompress lag" = ('ubfcompress'/'Total Runs') * 100
| eval "Transfer Time" = ('Xfer Time'/'Total Runs') * 100
| eval "ACAP Processing" = ('ACAP Time'/'Total Runs') * 100
| eval "Total TAT" = ('Total'/'Total Runs') * 100
| fields - "Total Runs" "ubfcompress" "ACAP Time" "Xfer Time" "Total" "Xfer Rate"
| transpose
| eval Threshold = case(column=="UBFcompress lag", "30 Minutes",
column=="Total TAT", "150 Minutes",
column=="Transfer Time", "60 Minutes",
column=="ACAP Processing", "60 Minutes")
| eval sort_field = case(column=="UBFcompress lag", 2,
column=="Total TAT", 1,
column=="Transfer Time", 3,
column=="ACAP Processing", 5)
| sort sort_field
| fields - sort_field
| rename column AS "Turnaround Time Process"
| eval "row 1" = round('row 1', 1)
| rename "row 1" AS "Percent Runs over Threshold"
| table "Turnaround Time Process" Threshold "Percent Runs over Threshold" Median

This search pulls timestamps for checkpoints in our pipeline. I utilize these checkpoints to determine the length of time the process takes. I then need to compare the time for each individual "run" to a threshold in order to get the percentage of runs that took longer than that threshold on that specific process. I was able to do all that, however, I got a separate request to also display the median for each process which complicated things with the use of the transpose command.

Currently the end result from the above query looks like this

Turnaround Time Process Threshold   Percent Runs over Threshold Median
Total TAT                 150 Minutes   22.8    
UBFcompress lag          30 Minutes 39.0     
Transfer Time              60 Minutes   3.8 
ACAP Processing          60 Minutes 8.2
med_xfer_time                           4.3  
med_ACAP_time                           34.8     
med_ubf_compress                         12.0    
med_tot_time                            106.3

I'd like to save each of the med_* values to a median field matched to the respective process. So the final table should look like

 Turnaround Time Process    Threshold   Percent Runs over Threshold Median
    Total TAT                 150 Minutes   22.8                      106.3
    UBFcompress lag          30 Minutes 39.0                       12.0
    Transfer Time              60 Minutes   3.8                      4.3
    ACAP Processing          60 Minutes 8.2                        34.8

I'm having trouble using the stats command (to get the median values) in conjunction with the transpose command as I can't save the field values of the med_* to a new field (Median).

Any help/tips would be much appreciated!

DalJeanis · ‎09-11-2017

Okay, just as a general case, you make your life extremely difficult when you use field names with spaces. Best Practices are to use underscores, or better yet, use CamelCase and then add the spaces at the last minute after you've done all the heavy lifting and are ready for presentation... but only then if you feel you really have to.

We've refactored your code to eliminate the transpose completely, in favor of mvexpand. We had to make a number of assumptions about the data. In particular, the line | where isnotnull(genia) is an attempt to eliminate any records your line | search geniaComplete.flag = * would eliminate.

Hopefully, you will find this aircode useful. YMMV.

 (index=geniachip AND (geniaComplete.flag OR "DVT ready" OR "transfer complete for all banks" 
                       OR "lz4.complete*" OR "On-station compression complete.")) 
 OR (index=fbu_sizes)
 | fields run_id date fbuLZ4Size message timestamp unix_time
 | eventstats max(fbuLZ4Size) AS fbuLZ4Size by run_name 
 | eval run_id_date_size = run_id."##".date."@@".fbuLZ4Size

 | rename COMMENT as "set the times for each kind of message"
 | eval genia   = case(like(message,"genia%"), timestamp)
 | eval lz4     = case(like(message,"lz4.c%"), timestamp)
 | eval tranCom = case(like(message,"trans%"),unix_time)
 | eval Dvt     = case(like(message,"DVT r%"),unix_time)
 | eval OnSta   = case(like(message,"On-st%"),unix_time)

 | rename COMMENT as "roll together all the data for each run, kill those with no start time"
 | stats max(genia) as genia, max(lz4) as lz4, max(tranCom) as tranCom, max(OnSta) as OnSta, max(Dvt) as Dvt 
        by run_id_date_size
 | where isnotnull(genia)

 | rename COMMENT as "fan them out into four different lines, set the threshold, then calc the total, total over threshold, and median for each line"
 | eval fanme=mvrange(1,4.0001)
 | mvexpand fanme
 | eval threshold=case(fanme=1,150, fanme=2,30, fanme=3,60, fanme=4,60)
 | eval duration=case(fanme=1,((Dvt-genia)/60), fanme=2,((lz4-genia)/60), fanme=3,((TranCom-OnSta)/60), fanme=4,((Dvt-TranCom)/60))
 | stats count as TotalCount count(eval(duration>threshold)) as OverCount median(duration) as Median max(threshold) as threshold by fanme

 | rename COMMENT as "now we format for presentation"
 | eval process=case(fanme=1,"Total TAT", fanme=2,"UBFcompress lag", fanme=3,"Transfer Time", fanme=4,"ACAP Processing")
 | eval pctOver=round(100*OverCount/TotalCount,1)
 | eval Threshold = tostring(threshold)." Minutes"
 | rename process as "Turnaround Time Process", pctOver as "Percent Runs Over Threshold"
 | table "Turnaround Time Process", Threshold, "Percent Runs Over Threshold", Median

View solution in original post

DalJeanis · ‎09-11-2017

Okay, just as a general case, you make your life extremely difficult when you use field names with spaces. Best Practices are to use underscores, or better yet, use CamelCase and then add the spaces at the last minute after you've done all the heavy lifting and are ready for presentation... but only then if you feel you really have to.

We've refactored your code to eliminate the transpose completely, in favor of mvexpand. We had to make a number of assumptions about the data. In particular, the line | where isnotnull(genia) is an attempt to eliminate any records your line | search geniaComplete.flag = * would eliminate.

Hopefully, you will find this aircode useful. YMMV.

 (index=geniachip AND (geniaComplete.flag OR "DVT ready" OR "transfer complete for all banks" 
                       OR "lz4.complete*" OR "On-station compression complete.")) 
 OR (index=fbu_sizes)
 | fields run_id date fbuLZ4Size message timestamp unix_time
 | eventstats max(fbuLZ4Size) AS fbuLZ4Size by run_name 
 | eval run_id_date_size = run_id."##".date."@@".fbuLZ4Size

 | rename COMMENT as "set the times for each kind of message"
 | eval genia   = case(like(message,"genia%"), timestamp)
 | eval lz4     = case(like(message,"lz4.c%"), timestamp)
 | eval tranCom = case(like(message,"trans%"),unix_time)
 | eval Dvt     = case(like(message,"DVT r%"),unix_time)
 | eval OnSta   = case(like(message,"On-st%"),unix_time)

 | rename COMMENT as "roll together all the data for each run, kill those with no start time"
 | stats max(genia) as genia, max(lz4) as lz4, max(tranCom) as tranCom, max(OnSta) as OnSta, max(Dvt) as Dvt 
        by run_id_date_size
 | where isnotnull(genia)

 | rename COMMENT as "fan them out into four different lines, set the threshold, then calc the total, total over threshold, and median for each line"
 | eval fanme=mvrange(1,4.0001)
 | mvexpand fanme
 | eval threshold=case(fanme=1,150, fanme=2,30, fanme=3,60, fanme=4,60)
 | eval duration=case(fanme=1,((Dvt-genia)/60), fanme=2,((lz4-genia)/60), fanme=3,((TranCom-OnSta)/60), fanme=4,((Dvt-TranCom)/60))
 | stats count as TotalCount count(eval(duration>threshold)) as OverCount median(duration) as Median max(threshold) as threshold by fanme

 | rename COMMENT as "now we format for presentation"
 | eval process=case(fanme=1,"Total TAT", fanme=2,"UBFcompress lag", fanme=3,"Transfer Time", fanme=4,"ACAP Processing")
 | eval pctOver=round(100*OverCount/TotalCount,1)
 | eval Threshold = tostring(threshold)." Minutes"
 | rename process as "Turnaround Time Process", pctOver as "Percent Runs Over Threshold"
 | table "Turnaround Time Process", Threshold, "Percent Runs Over Threshold", Median

byu168168 · ‎09-13-2017

Thanks! It worked perfectly except for this line

| eval fanme=mvrange(1,4)

It should be

| eval fanme=mvrange(1,5)

to produce 4 events.

Thanks for the other tips too!

DalJeanis · ‎09-13-2017

@byu168168 - Doh! Dang test-before loops.

Creating a transposed table in conjunction with stats command

Splunk Custom Visualizations App End of Life

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk