Splunk Search

Alternative to join command to do a summation on date_hour

Path Finder

I need to sum up the time differential for two events on a datehour, datewday, and date_month basis. Originally I used the join command to accomplish this until I realized events were being dropped.

I changed my query to remove the command and am currently working with this

  index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
    | eval temp1=split(message, " ")
    | eval bank=mvindex(temp1, 0)
    | eval run_id_bank=run_name.".".tag_value.".".bank 
    | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
    | eval temp=date_hour."##".run_id_bank | chart values(stime) over temp by msg_type 
    | eval TimeDiff=(INFO-BEGIN)/3600 
    | eval date_month=mvindex(split(temp,"##"),0)
    | stats sum(TimeDiff) AS Sum by date_hour

Each run has 16 different banks which need to be differentiated. The issue I'm running into is when the decompression and processing occur in a different hour (or wday/month) then the differential won't be taken. This can be solved by charting over runidbank but then I lose the datehour timestamp. Is there a way to do the calculation but retain the datehour from the BEGIN (only occurs for processing) to do a summation?

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Your comments did make some things clear to me. This is my suggested query. I've made some changes to reduce the lines of code and corrected a type in the 2nd last line (datemonth should actually be datehour).

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | table run_id_bank BEGIN INFO TimeDiff date_hour

YOu can now validate the results first and then add following to get total TimeDiff by date_hour of the BEGIN events.

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | stats sum(TimeDiff) AS Sum by date_hour

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Your comments did make some things clear to me. This is my suggested query. I've made some changes to reduce the lines of code and corrected a type in the 2nd last line (datemonth should actually be datehour).

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | table run_id_bank BEGIN INFO TimeDiff date_hour

YOu can now validate the results first and then add following to get total TimeDiff by date_hour of the BEGIN events.

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | stats sum(TimeDiff) AS Sum by date_hour

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Could you provide the mock table of what is your final expected output? (or what's wrong with output of current search that you have?)

0 Karma

Path Finder

Let's disregard the summation aspect for now and explain what's wrong prior to that

Take the following situation

Run1: Begins processing at 2:30 and finished decompression at 3:33, giving a TimeDiff of 63 minutes
Run2: Begins processing at 1:03 and finished decompression at 1:23, giving a TimeDiff of 20minutes

Below is the current output that I am getting. The TimeDiff isn't taken for run1 because the different datehour stamps separate the two messages even though it's the same runid_bank.

temp               BEGIN        INFO       TimeDiff date_hour
2##run1_id_bank 1481079017.00                            2
3##run1_id_bank                1481079161.00              3
1##run2_id_bank 1481079017.00  1481079173.00  20          1

Below is the kind of output I would be looking for

temp                     BEGIN            INFO          TimeDiff          date_hour
2##run1_id_bank 1481079017.00   1481079161.00           63                  2
1#run2_id_bank     1481079017.00    1481079173.00           20                    1

In this case I'd be able to sum up the TimeDiff for all the runs by the datehour returned from the BEGIN log. If I chart over just runidbank instead of temp I get the above table, with an empty datehour column. The temp column isn't necessary in that format as long as the other columns are accurate

Then I'd be able to run a summation on top of that query to get

date_hour               Sum
1                       20
2                       63
...
24
0 Karma