Monitoring Splunk

My join command is not extracting data from 2 indexes

delgendy
Explorer

The below query is the original one, it was working fine in the lower environments, once moved to the production with the over head, i m facing very bad performance and jobs are queuing.
I am trying to optimize the query to use join instead of OR between 2 different indexes but it is not extracting the messages i m looking for .
My original Query

(index=A source=test message="PUBLISH message recieved" ) OR (index=B  sourcetype=test2  "Message Successfully Processed to x")   
| eval Vin=if(isNull(clientId),Vin,clientId)
| eval activityId=if(isNull(activityId),ActivityID,activityId) 
| eval Entry= if(match(message,"PUBLISH message recieved"),_time,NULL)
| eval Exit=if(match(Message,"Message Successfully Processed to x"),_time,NULL) 
|stats min(Entry) as Entry,max(Exit) as Exit, values(Vin) as Vin, values(deviceid) as ESN by activityId
| rename activityId as traceid
 | fillnull value="NULL" | where Exit!="NULL"
 |eval duration=Exit-Entry | eval  Durations= if(duration<0,0,duration)
| convert ctime(Entry) |convert ctime(Exit) |table traceid, Entry,Exit,Vin,ESN,Durations | rename traceid as "Trace Id"

The one i m trying to build with the join command

(index=A   source=test  message="PUBLISH message recieved" ) 
|join activityId [search index=B  sourcetype="test2 " "Message Successfully Processed to x"  
 | eval Vin=if(isNull(clientId),Vin,clientId)
| eval activityId=if(isNull(activityId),ActivityID,activityId) ]
| eval Entry= if(match(message,"PUBLISH message recieved"),_time,NULL)
| eval Exit=if(match(Message,"Message Successfully Processed to x"),_time,NULL) 
|stats min(Entry) as Entry,max(Exit) as Exit, values(Vin) as Vin, values(deviceid) as ESN by activityId
| rename activityId as traceid
 | fillnull value="NULL" | where Exit!="NULL"
 |eval duration=Exit-Entry | eval  Durations= if(duration<0,0,duration)
| convert ctime(Entry) |convert ctime(Exit) |table traceid, Entry,Exit,Vin,ESN,Durations | rename traceid as "Trace Id"

Thanks and appreciate the support!

0 Karma
1 Solution

DalJeanis
Legend

join is almost always slower than a direct combined search. The way you've done it, it should be much slower, except that as an inner join, it will drop all records which do not have members on both sides of the join.

Just to verify - over the time range that you normally run this report, are duplicate traceids / activityids ever expected?

Here's a version of your original with some slight performance enhancements. You should look at this, and incorporate most of these changes into your join if you proceed that way. For instance, add the fields command at the top, and use the index= rather than the match code to test which record you have. An index= command will be able to reject the test at the first non-matching character, whereas match will have to scan the entire message field.

 (index=A source=test message="PUBLISH message recieved" ) OR 
 (index=B  sourcetype=test2  "Message Successfully Processed to x")   
 | fields index, Vin, clientId, activityId, ActivityID,  deviceid
 | eval Vin=coalesce(clientId,Vin)
 | eval traceid=coalesce(activityId,ActivityID) 
 | eval Entry = if(index="A",_time,null())
 | eval Exit = if(index="B",_time,null()) 
 | stats min(Entry) as Entry, max(Exit) as Exit, values(Vin) as Vin, values(deviceid) as ESN by traceid
 | where isnotnull(Exit)
 | eval  Durations= if(Exit<Entry,0,Exit-Entry)
 | convert ctime(Entry) 
 | convert ctime(Exit) 
 | table traceid, Entry, Exit, Vin, ESN, Durations 
 | rename traceid as "Trace Id"

View solution in original post

0 Karma

DalJeanis
Legend

join is almost always slower than a direct combined search. The way you've done it, it should be much slower, except that as an inner join, it will drop all records which do not have members on both sides of the join.

Just to verify - over the time range that you normally run this report, are duplicate traceids / activityids ever expected?

Here's a version of your original with some slight performance enhancements. You should look at this, and incorporate most of these changes into your join if you proceed that way. For instance, add the fields command at the top, and use the index= rather than the match code to test which record you have. An index= command will be able to reject the test at the first non-matching character, whereas match will have to scan the entire message field.

 (index=A source=test message="PUBLISH message recieved" ) OR 
 (index=B  sourcetype=test2  "Message Successfully Processed to x")   
 | fields index, Vin, clientId, activityId, ActivityID,  deviceid
 | eval Vin=coalesce(clientId,Vin)
 | eval traceid=coalesce(activityId,ActivityID) 
 | eval Entry = if(index="A",_time,null())
 | eval Exit = if(index="B",_time,null()) 
 | stats min(Entry) as Entry, max(Exit) as Exit, values(Vin) as Vin, values(deviceid) as ESN by traceid
 | where isnotnull(Exit)
 | eval  Durations= if(Exit<Entry,0,Exit-Entry)
 | convert ctime(Entry) 
 | convert ctime(Exit) 
 | table traceid, Entry, Exit, Vin, ESN, Durations 
 | rename traceid as "Trace Id"
0 Karma

delgendy
Explorer

Thanks DalJeanis so much this was helpful
I have made little changes on the query as well with your performance recommendation, but i want to perform timechart for the 95th perc of the Total Duration which is not working

index=A source=test
message="PUBLISH message recieved" OR logtype="Dog.Exit"
| eval Entry= if(match(message,"PUBLISH message recieved"),_time,NULL)
| eval Exit= if(match(logtype,"Dog.Exit"),_time,NULL)
| eval traceid=coalesce(traceid,activityId) | eval esn=coalesce(esn,deviceid)
|stats min(Entry) as Entry max(Exit) as Exit values(esn) as ESN by traceid|where isnotnull(Exit)
|eval VsdnDurations=if(Exit

0 Karma

DalJeanis
Legend

@delgendy - your code cut off, and I'm not sure what your question is.

A percentile is only valid across a population, so you probably just need to do something like eventstats ...

| eventstats perc95(Durations) as p95Durations      (((by field???)))

But what you want to chart against that, I'm not really clear on. If you want each time period's 95th percentile, then you'll need to binthe appropriate time field (Entry, probably) and then run eventstats on that. If you just want a horizontal line for the 95th percentile across the entire time period, that's another method. And if you want a running calculation of the 95th percentile, changing daily, then that's a third one (probably streamstats).

It would probably be most clear to start a new question devoted to what you are trying to achieve there.

By the way, you can probably use = instead of match() with the logtype comparison.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Why use join? It's slow.

---
If this reply helps you, Karma would be appreciated.
0 Karma

delgendy
Explorer

cause i want to search first in the index after the search, get the activityId then search with those in the first index .
i think i found the solution already , i just need to correlate between the 2 indexes by | fields message activityId so i can connect the 2 indexes . This worked for me . Thanks !

0 Karma
Get Updates on the Splunk Community!

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...