Splunk Search

How to fix performance issue when replacing join on rex value?

Keesh
Engager

Hi - I'm new to Splunk I am having a performance issue that causes a timeout over longer time spans on a base search I'm performing on a dashboard that uses a join. I have tried replacing the join with the suggested methods found here Here,  Here and Here.

Unfortunately, I am unable to get it to work correctly and output the correct value I am getting from my join search. Perhaps this is because of the spath/rex extract commands I am using?

Note my actual search uses tokens however I have replaced them with asterisks to avoid any confusion. 

Any help would be much appreciated!

My Code is:

 

 

      index=ivr_app sourcetype="CEM-AppLog" rosterInfo
        | rex "^(?:[^{]*){7}(?P<my_data>.+)"      
        | spath input=my_data output=vq path=TOD 
        | spath input=my_data output=steps path=steps{} 
        | spath input=my_data output=type path=type 
        | spath input=my_data output=virtualQueue path=virtualQueue
        | spath input=my_data output=last_step path=steps{}
        | eval res = mvindex(last_step,mvcount(last_step)-1)
        | spath input=res output=name path=name
        | spath input=res output=type path=type
        | rex field=_raw "SN_CONTEXT_ID (?P<SN_CONTEXT_ID>[^\s]+) produced" 
        | dedup SN_CONTEXT_ID
        | join type=inner SN_CONTEXT_ID[
            search index=ivr_app "pipeline at completion" AND CALL_FLOW AND DNIS EXCHANGE NOT "NPS" NOT "TFRDEST" NOT TFRNUM NOT "SN_CONTACT_TYPE=Transfer" NOT "SN_TARGET_TYPE=Release" AND "SN_CONTACT_REASON=" AND SN_CALL_FLAGS="*" OR NOT SN_CALL_FLAGS="*"
          | dedup SN_CONTEXT_ID CONNID
          | foreach SN_CALL_FLAGS [ eval <<FIELD>> = if(isnull(<<FIELD>>) OR len(<<FIELD>>)==0, "NO_CALL_FLAG", <<FIELD>>) ]
          | search CLI="*" AND CONNID="*" AND SN_CALL_FLAGS="*" AND DNIS="*"
          ]
        |  search type="Agent"
        | stats count as countAgent

 

 

Labels (4)
Tags (2)
0 Karma
1 Solution

DalJeanis
Legend

Okay, we've torn your code apart and reconstructed it, and then realized that it seems like  it probably wastes a lot of machine time based on what you are ending up with.  

In plain English, you are trying to get a count of Agent records that match a bunch of specific filters. Those filters come from either a JSON in the CEM-Applog or from SN_CALL_FLAGS in the other records.  It seems like you should be able to get this in a much more concise way, but we'd need to know more about the data in order to code that.

 Here's a shot-in-the-dark rewrite to use the Splunk soup method.

 

   ( index=ivr_app sourcetype="CEM-AppLog" rosterInfo )
   OR
   ( index=ivr_app 
     "pipeline at completion"
     CALL_FLOW  
     DNIS
     EXCHANGE 
     NOT NPS 
     NOT TFRDEST 
     NOT TFRNUM 
     NOT "SN_CONTACT_TYPE=Transfer" 
     NOT "SN_TARGET_TYPE=Release" 
     "SN_CONTACT_REASON=" 
     (SN_CALL_FLAGS="foo1" OR NOT SN_CALL_FLAGS="foo2")
    )

     | rename COMMENT as "This section processes the CEM-Applog records"
     | rex "^(?:[^{]*){7}(?P<my_data>.+)"      
     | spath input=my_data output=vq path=TOD 
     | spath input=my_data output=steps path=steps{} 
     | spath input=my_data output=type path=type 
     | spath input=my_data output=virtualQueue path=virtualQueue
     | spath input=my_data output=last_step path=steps{}
     | eval res = mvindex(last_step,mvcount(last_step)-1)
     | spath input=res output=name path=name
     | spath input=res output=type path=type
     | rex field=_raw "SN_CONTEXT_ID (?P<SN_CONTEXT_ID1>[^\s]+) produced" 
     | dedup SN_CONTEXT_ID1 keepempty=true
 
     | rename COMMENT as "This section processes the 'pipeline at completion' records"
     | dedup SN_CONTEXT_ID CONNID keepempty=true
     | eval SN_CONTEXT_ID=coalesce(SN_CONTEXT_ID,SN_CONTEXT_ID1)
     | fields - SN_CONTEXT_ID1
     | stats values(*) as * by SN_CONTEXT_ID 
     | foreach SN_CALL_FLAGS 
        [ eval <<FIELD>> = if(isnull(<<FIELD>>) OR len(<<FIELD>>)==0, "NO_CALL_FLAG", <<FIELD>>) ]
     | search CLI="foo4" AND CONNID="foo5" AND SN_CALL_FLAGS="foo6" AND DNIS="foo7"

 

Read the above very carefully and skeptically, and then try it to see how it performs relative to your other code.

Further Notes - 

Replacing code with asterisks can really mess up the advice you get.  Tradition is to use "foo" and "bar" and "baz" to represent different values, or "foo1" "foo2" foo3" etc.

 

Even with actual code, this construct here does not seem to be meaningful.  If the two asterisks represent the same thing, then it's equivalent to isnotnull(SN_CALL_FLAGS).  If they represent different things, then the first clause is redundant, because when the first is true, the second is always true. 

 

 AND SN_CALL_FLAGS="*" OR NOT SN_CALL_FLAGS="*"

 

View solution in original post

DalJeanis
Legend

Okay, we've torn your code apart and reconstructed it, and then realized that it seems like  it probably wastes a lot of machine time based on what you are ending up with.  

In plain English, you are trying to get a count of Agent records that match a bunch of specific filters. Those filters come from either a JSON in the CEM-Applog or from SN_CALL_FLAGS in the other records.  It seems like you should be able to get this in a much more concise way, but we'd need to know more about the data in order to code that.

 Here's a shot-in-the-dark rewrite to use the Splunk soup method.

 

   ( index=ivr_app sourcetype="CEM-AppLog" rosterInfo )
   OR
   ( index=ivr_app 
     "pipeline at completion"
     CALL_FLOW  
     DNIS
     EXCHANGE 
     NOT NPS 
     NOT TFRDEST 
     NOT TFRNUM 
     NOT "SN_CONTACT_TYPE=Transfer" 
     NOT "SN_TARGET_TYPE=Release" 
     "SN_CONTACT_REASON=" 
     (SN_CALL_FLAGS="foo1" OR NOT SN_CALL_FLAGS="foo2")
    )

     | rename COMMENT as "This section processes the CEM-Applog records"
     | rex "^(?:[^{]*){7}(?P<my_data>.+)"      
     | spath input=my_data output=vq path=TOD 
     | spath input=my_data output=steps path=steps{} 
     | spath input=my_data output=type path=type 
     | spath input=my_data output=virtualQueue path=virtualQueue
     | spath input=my_data output=last_step path=steps{}
     | eval res = mvindex(last_step,mvcount(last_step)-1)
     | spath input=res output=name path=name
     | spath input=res output=type path=type
     | rex field=_raw "SN_CONTEXT_ID (?P<SN_CONTEXT_ID1>[^\s]+) produced" 
     | dedup SN_CONTEXT_ID1 keepempty=true
 
     | rename COMMENT as "This section processes the 'pipeline at completion' records"
     | dedup SN_CONTEXT_ID CONNID keepempty=true
     | eval SN_CONTEXT_ID=coalesce(SN_CONTEXT_ID,SN_CONTEXT_ID1)
     | fields - SN_CONTEXT_ID1
     | stats values(*) as * by SN_CONTEXT_ID 
     | foreach SN_CALL_FLAGS 
        [ eval <<FIELD>> = if(isnull(<<FIELD>>) OR len(<<FIELD>>)==0, "NO_CALL_FLAG", <<FIELD>>) ]
     | search CLI="foo4" AND CONNID="foo5" AND SN_CALL_FLAGS="foo6" AND DNIS="foo7"

 

Read the above very carefully and skeptically, and then try it to see how it performs relative to your other code.

Further Notes - 

Replacing code with asterisks can really mess up the advice you get.  Tradition is to use "foo" and "bar" and "baz" to represent different values, or "foo1" "foo2" foo3" etc.

 

Even with actual code, this construct here does not seem to be meaningful.  If the two asterisks represent the same thing, then it's equivalent to isnotnull(SN_CALL_FLAGS).  If they represent different things, then the first clause is redundant, because when the first is true, the second is always true. 

 

 AND SN_CALL_FLAGS="*" OR NOT SN_CALL_FLAGS="*"

 

Keesh
Engager

Hey DalJeanis - Thanks for the prompt reply. 

Regarding the wasted machine time - I should have probably further clarified that the majority of the search was the base search and the last couple of lines (to get "Agent") was just for one panel of the dashboard (there are multiple other panels that pull something different other than Agent.)

Thank you for the advice on asterisks - I will keep that in mind in future. Now that you have pointed it out, I can see how it can get confusing.

Regarding your answer - it works! However, unfortunately it actually performs far slower than the join for some reason. I will Keep this thread open for a while before accepting it as the answer, in the case that someone might know an alternative method or how to improve on it performance wise. 

Thank you very much for your help.

0 Karma
Get Updates on the Splunk Community!

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Changes to Splunk Instructor-Led Training Completion Criteria

We’re excited to share an update to our instructor-led training program that enhances the learning experience ...