topic Re: How to optimize Splunk query by avoiding join or anything else? in Splunk Enterprise

How to optimize Splunk query by avoiding join or anything else?

Splunk4 — Wed, 26 Oct 2022 14:06:25 GMT

Hi Everyone,

I have below query by which i am extracting manager name,email etc. by applying join on managerno to all employee records since manager will also be a part of it.This query is taking 120 sec to execute but i need to minimize and keep it under 20 seconds.Any help will be appreciated.

index="myid_transac_idx" sourcetype="myID_Identity" earliest=-1d@d latest=now() |fields employeeno display_name loginid email status managerno Termination_process_date |where status="Terminated" and Termination_process_date > "2022-10-01 00:00:00.00" |join type=LEFT managerno [ search index="myid_transac_idx" sourcetype="myID_Identity" earliest=-1d@d latest=now() |fields employeeno display_name loginid email status |rename employeeno as managerno |rename display_name as manager_name |rename loginid as managerloginid |rename email as manageremail |rename status as managerstatus]|fields employeeno display_name loginid email status managerno manager_name managerloginid manageremail managerstatus | table employeeno display_name loginid email status managerno manager_name managerloginid manageremail managerstatus

Re: How to optimize Splunk query by avoiding join or anything else?

starcher — Wed, 26 Oct 2022 23:43:40 GMT

Review this conf talk as a starting place. https://conf.splunk.com/files/2019/slides/FNC2751.pdf

Re: How to optimize Splunk query by avoiding join or anything else?

bowesmana — Thu, 27 Oct 2022 01:16:51 GMT

Although a very useful doc, joining on self where the data set is the same is not something covered in that doc and stats will not really solve the problem in a simple way.

The simplest way is to create a lookup of managers with their managerno and then lookup on that rather than try to join on self.

It is technically possible to join on self by duplicating the data and then using stats/eventstats and some eval/if combinations.

Here's a psuedo example that creates 26 employees and assigns them to random managers and then populates the manager name to the terminated employee

| makeresults | fields - _time ``` Create 26 employees A-Z ``` | eval ee=split("ABCDEFGHIJKLMNOPQRSTUVWXYZ", "") | mvexpand ee ``` Create their manager - one of 25 - not self ``` | eval mg=split("ABCDEFGHIJKLMNOPQRSTUVWXYZ", "") | eval mgs=mvmap(mg, if(mg=ee, null(), mg)) | eval mg=mvindex(mgs, random() % 25) | fields - mgs ``` Now create their name ``` | eval ee_name="User ".ee ``` Now duplicate the entire data set ``` | eval n=mvrange(1,3,1) | mvexpand n ``` So now we have duplicated data and n=1 are the employees and we will make n=2 potential managers ``` ``` So, now let's make some of the n=1 employees terminated ``` | eval status=case(n=1 AND random() % 100 < 20, "Terminated", n=1, "Employed", n=2, null()) ``` and filter out employeed employees - keep terminated and managers ``` | where n=2 OR status="Terminated" ``` Now fudge the manager data by setting manager name to employee name ``` | eval mgr_name=if(n=2, ee_name, null()) | eval ee_name=if(n=2, null(), ee_name) ``` and set the manager to self ``` | eval mg=if(n=2, ee, mg) ``` and this will not populate the manager to the terminated employee ``` | eventstats values(mgr_name) as mgr_name by mg ``` Now get rid of managers ``` | where n=1 | fields - n

You will need to work that somewhat to your data set and depending on the volume of your data, the mvexpand may not work if your data set is large.

Re: How to optimize Splunk query by avoiding join or anything else?

johnhuang — Thu, 27 Oct 2022 03:39:57 GMT

Like @bowesmana said, the ideal solution is to generate a lookup. But if you want to do it without, this is how you can simplify:

index="myid_transac_idx" sourcetype="myID_Identity" earliest=-1d@d | fields employeeno display_name loginid email status managerno Termination_process_date | dedup employeeno | appendpipe [| rename employeeno AS managerno display_name AS manager_name loginid AS managerloginid email AS manageremail status AS managerstatus | fields - Term*] | eventstats max(manager*) AS manager* BY managerno | where LEN(display_name)>1 AND status="Terminated" AND Termination_process_date>"2022-10-01 00:00:00.00" | table employeeno display_name loginid email status managerno manager_name managerloginid manageremail managerstatus

Re: How to optimize Splunk query by avoiding join or anything else?

bowesmana — Thu, 27 Oct 2022 04:43:13 GMT

@johnhuang

appendpipe - nice way to do the data copy - better than using mv and mvexpand. You can also do renames, instead of the evals in my solution as you have the subsearch.

Re: How to optimize Splunk query by avoiding join or anything else?

Splunk4 — Thu, 27 Oct 2022 08:10:37 GMT

Thanks,,will try through lookup option since the data is not small