Solved: Re: How to improve the speed of Spunk search

qazwsxe · ‎06-27-2019

I want to get hundreds of millions of data from billions of data, but it takes more than an hour each time.
I just used the simplest search: index="test" name=jack But, it's very slow.

Then I checked the memory and CPU usage. Each search takes only 200-300 MB of memory.
So I modified the max_mem_usage_mb, search_process_memory_usage_percentage_threshold and search_process_memory_usage_threshold parameters in $SPLUNK_HOME/etc/apps/search/local/limits.conf, but they didn't seem to play a significant role.
Is there any effective way to improve the speed of my search?
Thanks! 🙂

martin_mueller · ‎07-05-2019

You'll be much faster in finding Jack's company if you also specify how to find a company in your search. What that looks like depends on your data which you didn't share with us - knowing your data would help.

That could look like one of these:

index=foo sourcetype=company_register name=jack
index=foo category=employees name=jack
etc.

If you have an accelerated datamodel, it could look like this:

| tstats summariesonly=t values(company) as companies from datamodel=your_model where your_model.name=jack

To chain that you could build a dashboard with in-page drilldowns that steps through the tree you expect in your data.

View solution in original post

woodcock · ‎07-05-2019

If you need a generic simmary of your Millions of events, then try fieldsummary:

index=<YouShouldAlwaysSpecifyAnIndex> AND sourcetype=<AndSourcetypeToo> AND name="jack" | fieldsummary

martin_mueller · ‎07-05-2019

You'll be much faster in finding Jack's company if you also specify how to find a company in your search. What that looks like depends on your data which you didn't share with us - knowing your data would help.

That could look like one of these:

index=foo sourcetype=company_register name=jack
index=foo category=employees name=jack
etc.

If you have an accelerated datamodel, it could look like this:

| tstats summariesonly=t values(company) as companies from datamodel=your_model where your_model.name=jack

To chain that you could build a dashboard with in-page drilldowns that steps through the tree you expect in your data.

martin_mueller · ‎07-05-2019

What do you need raw events for? Put all relevant fields into the data model and go from there.

qazwsxe · ‎07-05-2019

Another problem is that if I specify the fields to be returned, in the case of large amounts of data, the speed is slower than that of direct search.

martin_mueller · ‎07-05-2019

I misplaced my crystal ball, this would be so much easier if you included your searches and data samples/description.

qazwsxe · ‎07-07-2019

The search statement I use is |pivot datamodel dataset SPLITROW name as new_name FILTER name is jack.The speed is slower than index="test" name=jack.At this time, CPU and memory usage increased sharply.

qazwsxe · ‎07-05-2019

Sorry,there is a mistake.I know what to do,thanks!!!!!!!

qazwsxe · ‎07-05-2019

I want to use the data model to speed up the search, so I need to return the search field, which is the result of the event, rather than the statistics. I added a part of the field to the data model, but using | tstats summariesonly = t values (company) as companies from the data model = your_model where your_model. name = Jackdoes not return any event results.

qazwsxe · ‎07-05-2019

Thanks,it's very fast!
But I don't need statistics, I want to return the event results. What should I do?

woodcock · ‎07-02-2019

A single indexer like you have in your AllInOne configuration cannot efficiently handle billions of events by itself. You need many more indexers so that the main power of Splunk (Parallel Map and Reduce) can be unleashed.

martin_mueller · ‎07-04-2019

There is no reduce in your search.

You're not going to get good advice without describing your use case.

qazwsxe · ‎07-04-2019

I just want to get lots of data in a short time.I added hundreds of millions of pieces of data to an indexer.I just used the simplest search: index="test" name=jack,it's very slow.
Then I tred to build a datamodel,but only PivotTable can be generated. I want to generate search results.
What commands do I need to use to speed up my search through the datamodel?
Thanks! 🙂

martin_mueller · ‎07-04-2019

"Getting data" is not a use case.

qazwsxe · ‎07-04-2019

Sorry,I don't know what you means......

martin_mueller · ‎07-04-2019

There's no value in listing millions of events on screen, which is what your current search does.

Describe what you actually want to achieve instead of just trolling.

qazwsxe · ‎07-04-2019

Sure,I know it's no value in listing millions of events on screen.I want to use keyword search to get the required data from billions of data, and the results may only be hundreds or thousands. But usually the data base is too large, and the search always becomes very slow.

martin_mueller · ‎07-04-2019

If you really just want to list events on screen, append | head 1000 to your search. Nobody's meant to page past 1000 events.

qazwsxe · ‎07-05-2019

I tred to build a datamodel.It's verty fast,but only PivotTable can be generated. I want to generate search results.
What commands do I need to use to speed up my search through the datamodel?

qazwsxe · ‎07-05-2019

OK，maybe that's an effective way to do it.Actually,thanks!
I want to call the interface and search recursively.For example, the key word is name = jack.For the first time, relevant information has been searched out, such as mailbox, company, etc.Then search the company, mailbox and so on as keywords again.So go back and forth until you find all the information associated with name = jack.
Is there a good way to optimize the search algorithm, or does Splunk have its own recursive search command?

qazwsxe · ‎07-03-2019

I added more indexers and used distributed. The test data were distributed to three machines, each with five indexers and each indexer with 20,000,000 data. But there is no improvement in search speed. How can I improve it?

How to improve the speed of Spunk search

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

Preparing your Splunk Environment for OpenSSL3

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector