Splunk Search

Is it possible to use appendcols without multiple searches to improve efficiency?

asherman
Path Finder

Hi,

I am trying to build a table of results, and my query has the form:

search X
|dedup block
|stats count by block
|appendcols [search X | stats ... | eval ... | chart ...]
|appendcols [search X | stats ... | eval ... | chart ...]
|appendcols [search X | stats ... | eval ... | chart ...]

Each search is the same while the subsequent operations differ. I can get the table I want, but as my data grows my query's time is increasing a lot. Is it possible for me to save the search (variable based on a given query) so that I can reuse the results and avoid the duplicate searches? Alternatively, is there a better command for this purpose than appendcols?

Thanks

Edit:
The full query is:

index=test_index3 base="*" corner=*100c*
| dedup dir corner path name
| stats count as paths by block
| appendcols [search index=test_index3 base="*" corner=*100c* 
    | stats count(eval(isNum(base))) as num_vals, count as num_paths by block
    | eval percent=round(num_vals*100/num_paths,2)
    | chart max(percent) as "num_vals" by block]
| appendcols [search index=test_index3 base="*" NOT base="*nan*" corner=*100c* 
    | stats count(eval((base-min)>0 AND (base-max)<0)) AS bounded, count AS num_vals by block
    | eval percent=round(bounded*100/num_vals,2)
    | chart max(percent) as "bounded paths" by block]
| appendcols [search index=test_index3 base="*" NOT base="*nan*" corner=*100c* 
| stats count(eval((base-max)<0)) AS bounded, count AS num_vals by block
    | eval percent=round(bounded*100/num_vals,2)
    | chart max(percent) as "max bounded paths" by block]
| appendcols [search index=test_index3 base="*" NOT base="*nan*" corner=*100c* 
    | stats count(eval((base-min)>0)) AS bounded, count AS num_vals by block
    | eval percent=round(bounded*100/num_vals,2)
    | chart max(percent) as "min bounded paths" by block]
| fields block, paths, num_vals, "bounded paths", "max bounded paths", "min bounded paths" 
| rename num_vals AS "% data"
| rename "bounded paths" AS "% of data bounded"
| rename "max bounded paths" AS "% of data bounded by max"
| rename "min bounded paths" AS "% of data bounded by min"
Tags (2)
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

First off, corner=*100c* usually is quite inefficient because of the leading wildcard. If the number of events scanned vs the number of events matched is high then you may be able to speed everything up here. How depends on your data, look for a way to identify relevant events based on full tokens or at least prefixes of tokens, ie without leading wildcards.

Looking at the search itself, here's how I'd start by combining the first two parts.

(1) index=test_index3 base="*" corner=*100c* | dedup dir corner path name | stats count as paths by block

The dedup isn't present in the other searches, so let's avoid using it. This basically counts how many distinct quads of (dir, corner, path, name) you have for each value of block... so you can replace (1) with this:

(2) index=test_index3 base="*" corner=*100c* | eval dcpn = dir ."###". corner ."###". path ."###". name | stats dc(dcpn) as paths by block

Now let's look at the first subsearch:

(3) index=test_index3 base="*" corner=*100c* | stats count(eval(isNum(base))) as num_vals, count as num_paths by block | eval percent=round(num_vals*100/num_paths,2) | chart max(percent) as "num_vals" by block

That final chart looks useless to me. There only is one row for each block, so that basically does the same as fields - num_paths | rename percent as num_vals.

(4) index=test_index3 base="*" corner=*100c* | stats count(eval(isNum(base))) as num_vals, count as num_paths by block | eval num_vals=round(num_vals*100/num_paths,2) | fields - num_paths

When you compare (2) and (4), the searches basically do the same thing. Load the same events and compute some statistics... so you can combine them like this:

(5) index=test_index3 base="*" corner=*100c* | eval dcpn = dir ."###". corner ."###". path ."###". name
| stats dc(dcpn) as paths, count(eval(isNum(base))) as num_vals, count as num_paths by block 
| eval num_vals=round(num_vals*100/num_paths,2) | fields - num_paths

Without knowing your data - in (1) you dedup before counting the number of paths, in (4) you don't. Is that on purpose? From here it feels weird, but might match your data and use case so I left it as is.

Using this strategy you should be able to combine the other subsearches into the main search as well.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

First off, corner=*100c* usually is quite inefficient because of the leading wildcard. If the number of events scanned vs the number of events matched is high then you may be able to speed everything up here. How depends on your data, look for a way to identify relevant events based on full tokens or at least prefixes of tokens, ie without leading wildcards.

Looking at the search itself, here's how I'd start by combining the first two parts.

(1) index=test_index3 base="*" corner=*100c* | dedup dir corner path name | stats count as paths by block

The dedup isn't present in the other searches, so let's avoid using it. This basically counts how many distinct quads of (dir, corner, path, name) you have for each value of block... so you can replace (1) with this:

(2) index=test_index3 base="*" corner=*100c* | eval dcpn = dir ."###". corner ."###". path ."###". name | stats dc(dcpn) as paths by block

Now let's look at the first subsearch:

(3) index=test_index3 base="*" corner=*100c* | stats count(eval(isNum(base))) as num_vals, count as num_paths by block | eval percent=round(num_vals*100/num_paths,2) | chart max(percent) as "num_vals" by block

That final chart looks useless to me. There only is one row for each block, so that basically does the same as fields - num_paths | rename percent as num_vals.

(4) index=test_index3 base="*" corner=*100c* | stats count(eval(isNum(base))) as num_vals, count as num_paths by block | eval num_vals=round(num_vals*100/num_paths,2) | fields - num_paths

When you compare (2) and (4), the searches basically do the same thing. Load the same events and compute some statistics... so you can combine them like this:

(5) index=test_index3 base="*" corner=*100c* | eval dcpn = dir ."###". corner ."###". path ."###". name
| stats dc(dcpn) as paths, count(eval(isNum(base))) as num_vals, count as num_paths by block 
| eval num_vals=round(num_vals*100/num_paths,2) | fields - num_paths

Without knowing your data - in (1) you dedup before counting the number of paths, in (4) you don't. Is that on purpose? From here it feels weird, but might match your data and use case so I left it as is.

Using this strategy you should be able to combine the other subsearches into the main search as well.

martin_mueller
SplunkTrust
SplunkTrust

The dots concatenate strings, and without knowing your data I added a separator to make fairly sure that concatenating the four fields doesn't mess things up. Here's an example of how things would mess up with unlucky data:

d   c   p   n
1   2   3   45
1   2   34  5

Concatenating those two quads and then deduping or dc()ing would treat them as duplicates despite them obviously not being duplicates - both would end up being 12345. Adding the separator avoids that with almost certainty without knowing the data.

0 Karma

asherman
Path Finder

Sure.

By the way, what exactly is the ."###". doing? I'm not entirely clear on why that works.

0 Karma

asherman
Path Finder

Final result:

index=test_index3 base="*" corner=100c*
| dedup dir corner path name hs_or_lp
| eval dcpn = dir ."###". corner ."###". path ."###". name
| stats dc(dcpn) as paths, 
     count(eval(isNum(base))) as num_vals, 
     count(eval((base-min)>0 AND (base-max)<0)) AS bounded, 
     count(eval((base-max)<0)) AS bounded_max, 
     count(eval((base-min)>0)) AS bounded_min by block
  | eval percent=round(num_vals*100/paths,2) 
  | eval bounded_perc=round(bounded*100/num_vals,2) 
  | eval bounded_max_perc=round(bounded_max*100/num_vals,2) 
  | eval bounded_min_perc=round(bounded_min*100/num_vals,2)
| fields block, percent, bounded_perc, bounded_min_perc, bounded_max_perc 
| rename percent as "% of data"
| rename bounded_perc as "% of data bounded"
| rename bounded_max_perc as "% of data bounded by max"
| rename bounded_min_perc as "% of data bounded by min"

martin_mueller
SplunkTrust
SplunkTrust

Ah, I see - then it does indeed makes sense to use dedup for the entire search.

Consider posting the finished search for others who may later find this question.

asherman
Path Finder

Thank you for this - my query is now a lot neater and faster.

The purpose of the dedup was to remove duplicate entries for the same dir/corner/path/name (dcpn) set. It should have been with every search, but I don't think I had duplicate (dcpn) data entries yet. Using a single query as you've described, I should be able to use the dedup at the start to cover the entire query.

Thanks.

asherman
Path Finder

Sure, done.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Chances are you can do away with the subsearches and appendcols entirely. Post the full search and we'll see.

Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...