Solved: How to fix performance issue when replacing join o...

Keesh · ‎08-24-2020

Hi - I'm new to Splunk I am having a performance issue that causes a timeout over longer time spans on a base search I'm performing on a dashboard that uses a join. I have tried replacing the join with the suggested methods found here Here, Here and Here.

Unfortunately, I am unable to get it to work correctly and output the correct value I am getting from my join search. Perhaps this is because of the spath/rex extract commands I am using?

Note my actual search uses tokens however I have replaced them with asterisks to avoid any confusion.

Any help would be much appreciated!

My Code is:

      index=ivr_app sourcetype="CEM-AppLog" rosterInfo
        | rex "^(?:[^{]*){7}(?P<my_data>.+)"      
        | spath input=my_data output=vq path=TOD 
        | spath input=my_data output=steps path=steps{} 
        | spath input=my_data output=type path=type 
        | spath input=my_data output=virtualQueue path=virtualQueue
        | spath input=my_data output=last_step path=steps{}
        | eval res = mvindex(last_step,mvcount(last_step)-1)
        | spath input=res output=name path=name
        | spath input=res output=type path=type
        | rex field=_raw "SN_CONTEXT_ID (?P<SN_CONTEXT_ID>[^\s]+) produced" 
        | dedup SN_CONTEXT_ID
        | join type=inner SN_CONTEXT_ID[
            search index=ivr_app "pipeline at completion" AND CALL_FLOW AND DNIS EXCHANGE NOT "NPS" NOT "TFRDEST" NOT TFRNUM NOT "SN_CONTACT_TYPE=Transfer" NOT "SN_TARGET_TYPE=Release" AND "SN_CONTACT_REASON=" AND SN_CALL_FLAGS="*" OR NOT SN_CALL_FLAGS="*"
          | dedup SN_CONTEXT_ID CONNID
          | foreach SN_CALL_FLAGS [ eval <<FIELD>> = if(isnull(<<FIELD>>) OR len(<<FIELD>>)==0, "NO_CALL_FLAG", <<FIELD>>) ]
          | search CLI="*" AND CONNID="*" AND SN_CALL_FLAGS="*" AND DNIS="*"
          ]
        |  search type="Agent"
        | stats count as countAgent

DalJeanis · ‎08-24-2020

Okay, we've torn your code apart and reconstructed it, and then realized that it seems like it probably wastes a lot of machine time based on what you are ending up with.

In plain English, you are trying to get a count of Agent records that match a bunch of specific filters. Those filters come from either a JSON in the CEM-Applog or from SN_CALL_FLAGS in the other records. It seems like you should be able to get this in a much more concise way, but we'd need to know more about the data in order to code that.

Here's a shot-in-the-dark rewrite to use the Splunk soup method.

   ( index=ivr_app sourcetype="CEM-AppLog" rosterInfo )
   OR
   ( index=ivr_app 
     "pipeline at completion"
     CALL_FLOW  
     DNIS
     EXCHANGE 
     NOT NPS 
     NOT TFRDEST 
     NOT TFRNUM 
     NOT "SN_CONTACT_TYPE=Transfer" 
     NOT "SN_TARGET_TYPE=Release" 
     "SN_CONTACT_REASON=" 
     (SN_CALL_FLAGS="foo1" OR NOT SN_CALL_FLAGS="foo2")
    )

     | rename COMMENT as "This section processes the CEM-Applog records"
     | rex "^(?:[^{]*){7}(?P<my_data>.+)"      
     | spath input=my_data output=vq path=TOD 
     | spath input=my_data output=steps path=steps{} 
     | spath input=my_data output=type path=type 
     | spath input=my_data output=virtualQueue path=virtualQueue
     | spath input=my_data output=last_step path=steps{}
     | eval res = mvindex(last_step,mvcount(last_step)-1)
     | spath input=res output=name path=name
     | spath input=res output=type path=type
     | rex field=_raw "SN_CONTEXT_ID (?P<SN_CONTEXT_ID1>[^\s]+) produced" 
     | dedup SN_CONTEXT_ID1 keepempty=true
 
     | rename COMMENT as "This section processes the 'pipeline at completion' records"
     | dedup SN_CONTEXT_ID CONNID keepempty=true
     | eval SN_CONTEXT_ID=coalesce(SN_CONTEXT_ID,SN_CONTEXT_ID1)
     | fields - SN_CONTEXT_ID1
     | stats values(*) as * by SN_CONTEXT_ID 
     | foreach SN_CALL_FLAGS 
        [ eval <<FIELD>> = if(isnull(<<FIELD>>) OR len(<<FIELD>>)==0, "NO_CALL_FLAG", <<FIELD>>) ]
     | search CLI="foo4" AND CONNID="foo5" AND SN_CALL_FLAGS="foo6" AND DNIS="foo7"

Read the above very carefully and skeptically, and then try it to see how it performs relative to your other code.

Further Notes -

Replacing code with asterisks can really mess up the advice you get. Tradition is to use "foo" and "bar" and "baz" to represent different values, or "foo1" "foo2" foo3" etc.

Even with actual code, this construct here does not seem to be meaningful. If the two asterisks represent the same thing, then it's equivalent to isnotnull(SN_CALL_FLAGS). If they represent different things, then the first clause is redundant, because when the first is true, the second is always true.

 AND SN_CALL_FLAGS="*" OR NOT SN_CALL_FLAGS="*"

View solution in original post

DalJeanis · ‎08-24-2020

Okay, we've torn your code apart and reconstructed it, and then realized that it seems like it probably wastes a lot of machine time based on what you are ending up with.

In plain English, you are trying to get a count of Agent records that match a bunch of specific filters. Those filters come from either a JSON in the CEM-Applog or from SN_CALL_FLAGS in the other records. It seems like you should be able to get this in a much more concise way, but we'd need to know more about the data in order to code that.

Here's a shot-in-the-dark rewrite to use the Splunk soup method.

   ( index=ivr_app sourcetype="CEM-AppLog" rosterInfo )
   OR
   ( index=ivr_app 
     "pipeline at completion"
     CALL_FLOW  
     DNIS
     EXCHANGE 
     NOT NPS 
     NOT TFRDEST 
     NOT TFRNUM 
     NOT "SN_CONTACT_TYPE=Transfer" 
     NOT "SN_TARGET_TYPE=Release" 
     "SN_CONTACT_REASON=" 
     (SN_CALL_FLAGS="foo1" OR NOT SN_CALL_FLAGS="foo2")
    )

     | rename COMMENT as "This section processes the CEM-Applog records"
     | rex "^(?:[^{]*){7}(?P<my_data>.+)"      
     | spath input=my_data output=vq path=TOD 
     | spath input=my_data output=steps path=steps{} 
     | spath input=my_data output=type path=type 
     | spath input=my_data output=virtualQueue path=virtualQueue
     | spath input=my_data output=last_step path=steps{}
     | eval res = mvindex(last_step,mvcount(last_step)-1)
     | spath input=res output=name path=name
     | spath input=res output=type path=type
     | rex field=_raw "SN_CONTEXT_ID (?P<SN_CONTEXT_ID1>[^\s]+) produced" 
     | dedup SN_CONTEXT_ID1 keepempty=true
 
     | rename COMMENT as "This section processes the 'pipeline at completion' records"
     | dedup SN_CONTEXT_ID CONNID keepempty=true
     | eval SN_CONTEXT_ID=coalesce(SN_CONTEXT_ID,SN_CONTEXT_ID1)
     | fields - SN_CONTEXT_ID1
     | stats values(*) as * by SN_CONTEXT_ID 
     | foreach SN_CALL_FLAGS 
        [ eval <<FIELD>> = if(isnull(<<FIELD>>) OR len(<<FIELD>>)==0, "NO_CALL_FLAG", <<FIELD>>) ]
     | search CLI="foo4" AND CONNID="foo5" AND SN_CALL_FLAGS="foo6" AND DNIS="foo7"

Read the above very carefully and skeptically, and then try it to see how it performs relative to your other code.

Further Notes -

Replacing code with asterisks can really mess up the advice you get. Tradition is to use "foo" and "bar" and "baz" to represent different values, or "foo1" "foo2" foo3" etc.

Even with actual code, this construct here does not seem to be meaningful. If the two asterisks represent the same thing, then it's equivalent to isnotnull(SN_CALL_FLAGS). If they represent different things, then the first clause is redundant, because when the first is true, the second is always true.

 AND SN_CALL_FLAGS="*" OR NOT SN_CALL_FLAGS="*"

Keesh · ‎08-24-2020

Hey DalJeanis - Thanks for the prompt reply.

Regarding the wasted machine time - I should have probably further clarified that the majority of the search was the base search and the last couple of lines (to get "Agent") was just for one panel of the dashboard (there are multiple other panels that pull something different other than Agent.)

Thank you for the advice on asterisks - I will keep that in mind in future. Now that you have pointed it out, I can see how it can get confusing.

Regarding your answer - it works! However, unfortunately it actually performs far slower than the join for some reason. I will Keep this thread open for a while before accepting it as the answer, in the case that someone might know an alternative method or how to improve on it performance wise.

Thank you very much for your help.

How to fix performance issue when replacing join on rex value?

field extraction

join

rex

subsearch

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Agentic with Splunk Lantern: Connect to Cisco Cloud Control, Transform ...

July Community Events: Master ITSI 5.0 & Automate Splunk

New Release of Federated Search: Bringing Splunk Analytics to More of Your Data

Join the Conversation