Splunk Search

Alternative to join command to do a summation on date_hour

byu168
Path Finder

I need to sum up the time differential for two events on a date_hour, date_wday, and date_month basis. Originally I used the join command to accomplish this until I realized events were being dropped.

I changed my query to remove the command and am currently working with this

  index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
    | eval temp1=split(message, " ")
    | eval bank=mvindex(temp1, 0)
    | eval run_id_bank=run_name.".".tag_value.".".bank 
    | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
    | eval temp=date_hour."##".run_id_bank | chart values(stime) over temp by msg_type 
    | eval TimeDiff=(INFO-BEGIN)/3600 
    | eval date_month=mvindex(split(temp,"##"),0)
    | stats sum(TimeDiff) AS Sum by date_hour

Each run has 16 different banks which need to be differentiated. The issue I'm running into is when the decompression and processing occur in a different hour (or wday/month) then the differential won't be taken. This can be solved by charting over run_id_bank but then I lose the date_hour timestamp. Is there a way to do the calculation but retain the date_hour from the BEGIN (only occurs for processing) to do a summation?

0 Karma
1 Solution

somesoni2
Revered Legend

Your comments did make some things clear to me. This is my suggested query. I've made some changes to reduce the lines of code and corrected a type in the 2nd last line (date_month should actually be date_hour).

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | table run_id_bank BEGIN INFO TimeDiff date_hour

YOu can now validate the results first and then add following to get total TimeDiff by date_hour of the BEGIN events.

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | stats sum(TimeDiff) AS Sum by date_hour

View solution in original post

0 Karma

somesoni2
Revered Legend

Your comments did make some things clear to me. This is my suggested query. I've made some changes to reduce the lines of code and corrected a type in the 2nd last line (date_month should actually be date_hour).

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | table run_id_bank BEGIN INFO TimeDiff date_hour

YOu can now validate the results first and then add following to get total TimeDiff by date_hour of the BEGIN events.

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | stats sum(TimeDiff) AS Sum by date_hour
0 Karma

somesoni2
Revered Legend

Could you provide the mock table of what is your final expected output? (or what's wrong with output of current search that you have?)

0 Karma

byu168
Path Finder

Let's disregard the summation aspect for now and explain what's wrong prior to that

Take the following situation

Run1: Begins processing at 2:30 and finished decompression at 3:33, giving a TimeDiff of 63 minutes
Run2: Begins processing at 1:03 and finished decompression at 1:23, giving a TimeDiff of 20minutes

Below is the current output that I am getting. The TimeDiff isn't taken for run1 because the different date_hour stamps separate the two messages even though it's the same run_id_bank.

temp               BEGIN        INFO       TimeDiff date_hour
2##run1_id_bank 1481079017.00                            2
3##run1_id_bank                1481079161.00              3
1##run2_id_bank 1481079017.00  1481079173.00  20          1

Below is the kind of output I would be looking for

temp                     BEGIN            INFO          TimeDiff          date_hour
2##run1_id_bank 1481079017.00   1481079161.00           63                  2
1#run2_id_bank     1481079017.00    1481079173.00           20                    1

In this case I'd be able to sum up the TimeDiff for all the runs by the date_hour returned from the BEGIN log. If I chart over just run_id_bank instead of temp I get the above table, with an empty date_hour column. The temp column isn't necessary in that format as long as the other columns are accurate

Then I'd be able to run a summation on top of that query to get

date_hour               Sum
1                       20
2                       63
...
24
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...