need help to optimize splunk query from database f...

anantdeshpande · ‎02-08-2017

index=idx_cibca__prod:- Has data from database having all fields but not CUST_NAME ie why we used join
idx_cibca_lookups_prod: Has data from csv file for TOP 25 clients (Fields: COUNTRY_CODE,CUST_NAME,CUST_ID) and maximum 500 records.
We are using *eventstats max(_time) |dedup TRANSACTION_ID and where VALUE_DT=VAR_COMPARE_DT because TRANSACTION_DT may get updated to any future date. So database will have only one record but Splunk will have two records (one with original date and one with new date). We want to skip those records whose date is updated to future date when searched by original date which is passed as token from dropdown.

Objective: Want count of transactions on the basis of individual status for only TOP 25 clients from csv file

Issue: Performance of the query is very slow and taking 4 min as there are 50 lakhs (5 millions) records in database for 10 days data.
We want to reduce data drastically before join.

index=idx_cibca_*_prod sourcetype=LVBT_*_DB_MASTER PAUYROLL_IND="Y" earliest=-10d@d
|join CUST_ID [search index=idx_cibca_lookups_prod source="*TOP_25_CLIENT_V*.CSV" CUST_NAME="*" |fields CUST_ID CUST_NAME] 
|eventstats max(_time) |dedup TRANSACTION_ID
|eval Current_Date=upper(strftime(now(),"%d-%b-%Y")) 
|eval VAR_DROPDOWN_DT="$Tran_Dt$" 
|eval VAR_COMPARE_DT=if(VAR_DROPDOWN_DT="now",Current_Date,VAR_DROPDOWN_DT) 
|where VALUE_DT=VAR_COMPARE_DT 
|table COUNTRY_CODE CUST_NAME CUST_ID PAYROLL_IND TRANSACTION_ID EXTRACT_STATUS POST_STATUS STATUS VAR_COMPARE_DT 
|eval "PAYROLL_STATUS"=if((like(EXTRACT_STATUS,"%ACK%")) OR (like(EXTRACT_STATUS,"%CMP%")) OR (like(EXTRACT_STATUS,"%SND%")) OR (like(EXTRACT_STATUS,"%ACK%")) AND (like(POST_STATUS,"%ACK%")),"SUCCESSFULL","PENDING")
|search PAYROLL_STATUS ="PENDING" 
|eval STATUS=if(((like(EXTRACT_STATUS,"%SUS%")) AND (like(POST_STATUS="%OFA%")),"Pending In OFAC"),if((like(EXTRACT_STATUS,"%RVP%")) AND (like(POST_STATUS="%RVP%")),"Pending In Fraud"))
|stats count by CUST_NAME COUNTRY_CODE CUST_ID PAYROLL_STATUS PAYRILL_IND STATUS VAR_COMPARE_DT

richgalloway · ‎02-08-2017

If the idx_cibca_lookups_prod index only contains 500 records, consider writing it to a lookup file and using lookup instead of join.

---
If this reply helps you, Karma would be appreciated.

dwaddle · ‎02-08-2017

To add to Rich's excellent advice ... repeat the manta, "Splunk is not an SQL database" several times. The join command is useful in a few limited cases, but most of the time there are profoundly better ways of solving your problem than trying to use join.

But beyond that I'll admit I'm having trouble following your search - especially without some sample data

anantdeshpande · ‎02-08-2017

Sorry but we are not allowed to use lookup.

richgalloway · ‎02-08-2017

That's too bad. Joins are expensive and getting rid of them could have earned you a bonus. 🙂

There are some small optimizations you can make.
Move the dedup command before eventstats.
Consider moving the 'search PAYROLL_STATUS ="PENDING"' command and it's associated eval as close to the beginning of the search as you can. With some tweaking you may be able to include it in the base search and reduce the number of rows read.

---
If this reply helps you, Karma would be appreciated.

need help to optimize splunk query from database feed

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?

need help to optimize splunk query from database feed

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I