Splunk Search

Splunk query optimization

uagraw01
Motivator

Hello Splunker!!

Could you please help me to optimize below query ? Customer saying dedup is taking so much resource consumption. So what should I change in the query so that the complete query gets optimized?

index=abc  sourcetype=abc _tel type=TEL (trigger=MFC_SND OR trigger=FMC_SND) telegram_type=CO order_type=TO area=D10 aisle=A01 *1000383334*
| rex field=_raw "(?P<Ordernumber>[0-9]+)\[ETX\]"
| fields _time area aisle section source_tel position destination Ordernumber
| join area aisle
[ inputlookup isc where section=""
| fields area aisle mark_code
| rename area AS area aisle AS aisle]
| lookup movement_type mark_code source AS source_tel position AS position destination AS destination OUTPUT movement_type
| fillnull value="Unspecified" movement_type
| eval movement_category = case( movement_type like "%IH - LH%", "Storage", movement_type like "%LH - R%", "Storage", movement_type like "%IH - IH%", "Storage", movement_type like "%R - LH%", "Retrieval", movement_type like "%LH - O%", "Retrieval", 1 == 1, "Unknown" )
| fields - source_tel position destination
| dedup Ordernumber movement_category
| stats count AS orders by area aisle section movement_category movement_type Ordernumber _raw
Labels (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Optimisation will usually depend on the data set(s) you are dealing with, which you haven't provided. Having said that, the dedup by Ordernumber and movement_category will mean that there is only one event with each unique combination of the values in these fields, which means the count from the stats will always be 1, so what is the point of doing the stats?

Your join is to an inputlookup, can this be replaced by a simple lookup?

0 Karma

uagraw01
Motivator

@ITWhisperer  Thanks for your response.

As per your suggestion  I will take care of the join and replace that will lookup command. 

I am adding screenshots of the results so that you can get a little more clarity.

Below are the results while executing the above query. Order number is same but one entry is for "Storage" & other one for "Retrieval" . 

uagraw01_0-1728904090796.png

Job inspection   while executing above query

uagraw01_1-1728904194666.png

Do you have any suggestion so that I can replace dedup with some more optimized command?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, if your Splunk needs 13 seconds to scan just 14 thousand events... that looks weird.

But if you have big events (like 100K-big jsons), considering your wildcard at the beginning of the search term, the initial search might indeed be slow.

So that's the first and probably the most important optimization you can do - if you can drop the wildcard at the beginning of *1000383334*, it will save you a lot of time.

Notice that Splunk had to scan over 14k events just to match two of them. That's because it can't use indexed terms, it has to scan every single raw event.

Since you're extracting  Odernumber (and rely on it being non-empty by including it in the BY clause for stats) using [EXT] as an anchor for your regex the [EXT] part must obviously be in your event. So if it's only in part of the events, you can use it as additional search term (square brackets are major breakers so you can just add EXT to your search terms).

The inputlookup and join @ITWhisperer already covered.

Dedup should _not_ be using much resources. As you can see from your job inspect table, it gets just 10 results on input and returns 2. It's not a huge amount. The main problem here is the initial search.

Also if your events are big, you can drop _raw early on so you don't drag it along with you along the pipeline (you only use a few fields in your stats anyway).

ITWhisperer
SplunkTrust
SplunkTrust

Given that the number of orders is always 1 (as previously explained and shown in your screenshot), the dedup is not actually doing anything useful and can be removed. This could affect the orders field in that it could be more than 1. This could be resolved by either evaluating it to 1 after the stats command, or by using distinct count

| stats dc(Ordernumber) AS orders by area aisle section movement_category movement_type Ordernumber _raw
0 Karma

uagraw01
Motivator

@ITWhisperer After executing  suggested command I am getting below results. The count should 2 only.

1 for the storage and 1 for the retrieval.

uagraw01_0-1728969804057.png

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Remove _raw from the by clause

0 Karma

uagraw01
Motivator

@ITWhisperer  Thanks for your response.

With dedup command it is giving expected 2 results. 1 for storage and 2 for retrieval.

uagraw01_0-1728994373979.png

Without dedup command it is giving me 5 results.

uagraw01_1-1728994469784.png

Note : There are no such impact seen after removing the join inputlookup with "lookup command".

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| stats dc(Ordernumber) as count first(movement_type) as movement_type by area aisle section movement_category Ordernumber
0 Karma

uagraw01
Motivator

@ITWhisperer  Although this method is correct, it takes one second longer than the dedup command.

uagraw01_0-1728995180622.png

Thanks in advance.

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Have you tried this with the join being replaced by a lookup?

0 Karma

uagraw01
Motivator

@ITWhisperer Yes, I replace join with lookup command. Do you want me to try something more to optimize further?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Without knowledge of your data, I can't see any further optimisations you might try

0 Karma

uagraw01
Motivator

@ITWhisperer  After apply your suggested command I also replaced lookup command and now search is taking 10.5 seconds to complete the results.

uagraw01_0-1728996808788.png


Thanks for your help.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In January, the Splunk Threat Research Team had one release of new security content via the Splunk ES Content ...

Expert Tips from Splunk Professional Services, Ensuring Compliance, and More New ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Observability Release Update: AI Assistant, AppD + Observability Cloud Integrations & ...

This month’s releases across the Splunk Observability portfolio deliver earlier detection and faster ...