query optimization without join

vikashperiwal · ‎10-18-2019

I am having multiple index and sources , initially we wrote query using join and we got desired output , but now our planners want the query to be optimized and not to use join and append, below is my query can any one help me with better solution without join and append

index=ndspr sourcetype=SUMMARY_CDR_VW  OPC_CLLI_CD=CLGRAB1201T OR OPC_CLLI_CD=CLGRAB12GT0 OR OPC_CLLI_CD=EDTNAB02GT0 OR OPC_CLLI_CD=KMLPBC01GT0 OR OPC_CLLI_CD=VANCBC01GT0 OR OPC_CLLI_CD=EDTNAB02GT1 
| fields TERM_SUBGRPG_CD ORIG_POINT_CD TERM_GRPG_CD DPC_CARRIER_LONG_NM CALLED_PARTY_NOA_CD DEST_POINT_CD DPC_TOLL_IND GENERIC_PORTED_SUBGRPG_CD OPC_CARRIER_LONG_NM OPC_CLLI_CD DPC_CLLI_CD SUM_BILL_TM_CNT SUM_NTWK_DURTN_CNT TOTAL_CALL_CNT DMS_FILE 
| stats sum(SUM_BILL_TM_CNT) as SUM_BILL_TM_CNT, sum(SUM_NTWK_DURTN_CNT) as SUM_NTWK_DURTN_CNT, sum(TOTAL_CALL_CNT) as TOTAL_CALL_CNT by DPC_CARRIER_LONG_NM, CALLED_PARTY_NOA_CD, ORIG_POINT_CD, DEST_POINT_CD, DPC_CLLI_CD, OPC_CLLI_CD, TERM_SUBGRPG_CD TERM_GRPG_CD 
| join type=left TERM_SUBGRPG_CD 
    [ search index=csvlookups source="*NRS_Lite_SNP*" earliest=1 
    | rename NPANXX as TERM_SUBGRPG_CD 
    | fields TERM_SUBGRPG_CD COMPANY LIR "Rate  Center"] 
| join type=left TERM_GRPG_CD 
    [ search index=csvlookups source="*npa_report.csv*" earliest=1
    | rename NPA_ID as TERM_GRPG_CD 
    | fields TERM_GRPG_CD COUNTRY ] 
| join type=left ORIG_POINT_CD 
    [ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\191008_CLGRAB1201T_C7NETWRK.csv" earliest=-20d@d latest=now() 
    | rex field=PC "(?<p1>[\d]*).(?<p2>[\d]*).(?<p3>[\d]*)" 
    | eval p3=case(len(p3)==1,"00".p3,len(p3)==2,"0".p3) 
    | eval ORIG_POINT_CD=p1.".".p2.".".p3 
    | rename NAME as NETNAME 
    | fields ORIG_POINT_CD NETNAME NETTYPE NODENUM NODETYPE] 
| join type=left NETNAME DEST_POINT_CD 
    [ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\191008_CLGRAB1201T_C7RTESET.csv" earliest=-20d@d
    | rex field=PC "(?<p1>[\d]*).(?<p2>[\d]*).(?<p3>[\d]*)" 
    | eval p3=case(len(p3)==1,"00".p3,len(p3)==2,"0".p3) 
    | eval DPC=p1.".".p2.".".p3 
    | rename DPC as DEST_POINT_CD DESTNAME as ROUTESET 
    | fields NETNAME DEST_POINT_CD NETTYPE ROUTESET] 
| join type=left ROUTESET 
    [ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\191008_CLGRAB1201T_ISUPDEST.csv" earliest=-20d@d
    | rename ISUPROUT as ROUTESET 
    | fields ROUTESET CLLI ] 
| join type=left CLLI 
    [ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\191008_CLGRAB1201T_CLLI.csv" earliest=-20d@d
    | rename TRUNKCLLI as CLLI 
    | fields CLLI ADMININF ADNUM TRKGRSIZ] 
| eval country=if(CALLED_PARTY_NOA_CD==3,COUNTRY, 0) 
| fillnull value=0 
| table NETNAME, ROUTESET, ADMININF, TRKGRSIZ, CLLI, COMPANY,country LIR, TERM_GRPG_CD "Rate  Center", OPC_CLLI_CD DPC_CARRIER_LONG_NM CALLED_PARTY_NOA_CD ORIG_POINT_CD DEST_POINT_CD DPC_CLLI_CD TERM_SUBGRPG_CD SUM_BILL_TM_CNT SUM_NTWK_DURTN_CNT TOTAL_CALL_CNT 
| sort 0 - TOTAL_CALL_CNT

to4kawa · ‎10-21-2019

(first search) OR (join sub search 1) OR (join sub search 2) ....
| eventstats values(COMPANY) as COMPANY values(LIR) as LIR  values("Rate  Center") as  Rate_Center by TERM_SUBGRPG_CD 
| eventstats .....

In this way, you can search for the first time and collect them based on the fields you want to join in eventstats.

vikashperiwal · ‎10-21-2019

any optimized solution for the above query, i do not expect the complete query, but if any suggestions that would be help ful

ololdach · ‎10-21-2019

Hi,
to me it appears as if you import csv system snapshots into an index on a regular basis. In your query, I assume that you only need the last snapshot result as a lookup in order to enrich your data with a current status from the last reports. If so, you should consider to replace all of your joins with regular lookups.
Best regards
Oliver

vikashperiwal · ‎10-21-2019

removing lookup was our requirement, initially we had look ups but then we indexed the data with multiple source.

ololdach · ‎10-21-2019

Hi, from a data point of view, join is one of the most expensive operations you can do. If you want to optimise your query without using join, you will have to restructure your data in such a way that it can be avoided. For enriching the events with data that is available and valid at index time, you could consider to include lookups at index time or automatic lookups at search time. If .csv files are too slow for like 1M+ rows of lookup data, check out the kvstore as an alternative or use filtering or advanced lookup features like time lookups. IMHO join was never meant to be a replacement for lookups as it is a totally different operation and unless you actually need the product of two sets of data, should be avoided due to the high complexity of the operation - Oliver

query optimization without join

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation

query optimization without join

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability