optimization spl search command

indeed_2000 · ‎09-20-2021

Hi
I have spl command that take long time to return results!
The main goal is to find high duration consume by each servers overtime.

This spl command extract server and duration, is there any way to optimize this command?

index="my-index"
| search duration
| rex field=source "\/data\/(?<product>\w+)\\/(?<customer>\w+)/(?<date>\d+)\/log\.(?<servername>\w+)."
| rex "duration\[(?<duration>\d+.\d+)"

Scope: for 2 hours more than 20 billion events exist on this log file.

Any idea?
Thanks,

bowesmana · ‎09-20-2021

If you can also create a field extraction that will extract a duration field, then rather that just searching 'duration' which is looking for duration in the raw event text, you can search

index="my-index" duration>1000

which will limit the amount of rows you get back in the first place. At the moment you are doing the regex for all of those 2 billion rows.

Do you have any filtering criteria after those rex statements? If you are looking for long duration is that per event - if so the goal is to minimise the number of events you need to process, so searching for duration>X would help.

What else are you doing after the initial statements? Can you share more of that search or are you saying it takes two hours to just get that data?

gcusello · ‎09-20-2021

Hi @indeed_2000,

at first you don't need to put the additional condition after pipe "|", in this way the search is slower, so please see this and using the job inspector, see if you have better performaces with something like this:

index="my-index" duration
| rex field=source  "\/data\/(?<product>\w+)\\/(?<customer>\w+)/(?<date>\d+)\/log\.(?<servername>\w+)."  
| rex  "duration\[(?<duration>\d+.\d+)"

Second thing: if you could use one regex to extarct duration you'll have a quicker search.

But, having you so many events, there isn't any search that can have acceptable response times, so You ahve two solution or your problem, that i hint to use both:

check the resources of your Indexers,
use accelerated searches.

At first, how many CPUs have you in your Indexers, are you sure that they are sufficient?

Then, what's the IOPS of your storage?

storage is the real bottleneck of Splunk, for this reason, Splunk requires at least 800 IOPS (better 1200) on the storage.

About the second solution, see at https://docs.splunk.com/Documentation/SplunkCloud/latest/Knowledge/Aboutsummaryindexing and https://docs.splunk.com/Documentation/Splunk/8.2.2/Report/Acceleratereports and https://docs.splunk.com/Documentation/Splunk/8.2.2/Knowledge/Acceleratedatamodels

In few words. you should create a scheduled search (e.g. every 15 minutes or every hour) the pre elaborates your too many events and put results in a summary index or a data model, so, you can run your searches on them (that are very less and quicker!) instead on the real events.

Ciao.

Giuseppe

indeed_2000 · ‎09-20-2021

is it enough resource?

Cpu 26 cores (Core per socket 1)

RAM 64GB

iostat -d >
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdc 34.07 2007.13 3659.99 7109800816 12964706645

gcusello · ‎09-20-2021

Hi @indeed_2000,

in general 26 Cores is a very good configuration (12 is the minimum), RAM isn't relevant for searches.

About storage, you should make a check using e.g. Bonnie++ to see the real throughput of your storage.

Anyway, the problem I think is in the great number of events, so take in consideration the choices I quickly described to accelerate your searches.

Ciao.

Giuseppe

indeed_2000 · ‎09-20-2021

1-How about spl that use table? they won’t be able to accelerated. any other idea?

2-when I put accelerated report on dashboard, it will load from scratch!

gcusello · ‎09-21-2021

Hi @indeed_2000,

the best approach is to use a streaming comman (e.g. stats or timechart), if not possible run a search identifying the fields you need and save results in a summary index using a search like this:

index="my-index" duration
| rex field=source  "\/data\/(?<product>\w+)\\/(?<customer>\w+)/(?<date>\d+)\/log\.(?<servername>\w+)."  
| rex  "duration\[(?<duration>\d+.\d+)" 
| table date product customer servername duration
| collect index=my_summary

then you can run your search on the summary index and it's faster:

index=my_summary
| table date product customer servername duration

About the second question, please see at https://docs.splunk.com/Documentation/Splunk/8.2.2/Report/Embedscheduledreports

Ciao.

Giuseppe

indeed_2000 · ‎09-21-2021

do the same thing but index my_summary is empty after run collect query.Any idea?

bowesmana · ‎09-21-2021

@indeed_2000 The summary index must exist before you can collect to it.

gcusello · ‎09-21-2021

Hi @indeed_2000,

in this case, you have to check the generating scheduled search:

has it results?
when did it run last time?

Ciao.

Giuseppe

optimization spl search command

eval

regex

rex

stats

timechart

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Tiling

SOK it to Me: Top 3 Benefits of Using Splunk Operator on Kubernetes that’ll Make ...

Upgrade Prep for 10.4, Network Observability Deep Dives, and More from Splunk Lantern

Join the Conversation

optimization spl search command

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.