index=idx_cibca__prod:- Has data from database having all fields but not CUST_NAME ie why we used join
idx_cibca_lookups_prod: Has data from csv file for TOP 25 clients (Fields: COUNTRY_CODE,CUST_NAME,CUST_ID) and maximum 500 records.
We are using *eventstats max(_time) |dedup TRANSACTION_ID and where VALUE_DT=VAR_COMPARE_DT because TRANSACTION_DT may get updated to any future date. So database will have only one record but Splunk will have two records (one with original date and one with new date). We want to skip those records whose date is updated to future date when searched by original date which is passed as token from dropdown.
Objective: Want count of transactions on the basis of individual status for only TOP 25 clients from csv file
Issue: Performance of the query is very slow and taking 4 min as there are 50 lakhs (5 millions) records in database for 10 days data.
We want to reduce data drastically before join.
index=idx_cibca_*_prod sourcetype=LVBT_*_DB_MASTER PAUYROLL_IND="Y" earliest=-10d@d
|join CUST_ID [search index=idx_cibca_lookups_prod source="*TOP_25_CLIENT_V*.CSV" CUST_NAME="*" |fields CUST_ID CUST_NAME]
|eventstats max(_time) |dedup TRANSACTION_ID
|eval Current_Date=upper(strftime(now(),"%d-%b-%Y"))
|eval VAR_DROPDOWN_DT="$Tran_Dt$"
|eval VAR_COMPARE_DT=if(VAR_DROPDOWN_DT="now",Current_Date,VAR_DROPDOWN_DT)
|where VALUE_DT=VAR_COMPARE_DT
|table COUNTRY_CODE CUST_NAME CUST_ID PAYROLL_IND TRANSACTION_ID EXTRACT_STATUS POST_STATUS STATUS VAR_COMPARE_DT
|eval "PAYROLL_STATUS"=if((like(EXTRACT_STATUS,"%ACK%")) OR (like(EXTRACT_STATUS,"%CMP%")) OR (like(EXTRACT_STATUS,"%SND%")) OR (like(EXTRACT_STATUS,"%ACK%")) AND (like(POST_STATUS,"%ACK%")),"SUCCESSFULL","PENDING")
|search PAYROLL_STATUS ="PENDING"
|eval STATUS=if(((like(EXTRACT_STATUS,"%SUS%")) AND (like(POST_STATUS="%OFA%")),"Pending In OFAC"),if((like(EXTRACT_STATUS,"%RVP%")) AND (like(POST_STATUS="%RVP%")),"Pending In Fraud"))
|stats count by CUST_NAME COUNTRY_CODE CUST_ID PAYROLL_STATUS PAYRILL_IND STATUS VAR_COMPARE_DT
If the idx_cibca_lookups_prod index only contains 500 records, consider writing it to a lookup file and using lookup
instead of join
.
To add to Rich's excellent advice ... repeat the manta, "Splunk is not an SQL database" several times. The join
command is useful in a few limited cases, but most of the time there are profoundly better ways of solving your problem than trying to use join.
But beyond that I'll admit I'm having trouble following your search - especially without some sample data
Sorry but we are not allowed to use lookup.
That's too bad. Joins are expensive and getting rid of them could have earned you a bonus. 🙂
There are some small optimizations you can make.
Move the dedup
command before eventstats
.
Consider moving the 'search PAYROLL_STATUS ="PENDING"' command and it's associated eval as close to the beginning of the search as you can. With some tweaking you may be able to include it in the base search and reduce the number of rows read.