I want to get hundreds of millions of data from billions of data, but it takes more than an hour each time.
I just used the simplest search: index="test" name=jack
But, it's very slow.
Then I checked the memory and CPU usage. Each search takes only 200-300 MB of memory.
So I modified the max_mem_usage_mb, search_process_memory_usage_percentage_threshold and search_process_memory_usage_threshold parameters
in $SPLUNK_HOME/etc/apps/search/local/limits.conf
, but they didn't seem to play a significant role.
Is there any effective way to improve the speed of my search?
Thanks! 🙂
You'll be much faster in finding Jack's company if you also specify how to find a company in your search. What that looks like depends on your data which you didn't share with us - knowing your data would help.
That could look like one of these:
index=foo sourcetype=company_register name=jack
index=foo category=employees name=jack
etc.
If you have an accelerated datamodel, it could look like this:
| tstats summariesonly=t values(company) as companies from datamodel=your_model where your_model.name=jack
To chain that you could build a dashboard with in-page drilldowns that steps through the tree you expect in your data.
If you need a generic simmary of your Millions of events
, then try fieldsummary
:
index=<YouShouldAlwaysSpecifyAnIndex> AND sourcetype=<AndSourcetypeToo> AND name="jack" | fieldsummary
You'll be much faster in finding Jack's company if you also specify how to find a company in your search. What that looks like depends on your data which you didn't share with us - knowing your data would help.
That could look like one of these:
index=foo sourcetype=company_register name=jack
index=foo category=employees name=jack
etc.
If you have an accelerated datamodel, it could look like this:
| tstats summariesonly=t values(company) as companies from datamodel=your_model where your_model.name=jack
To chain that you could build a dashboard with in-page drilldowns that steps through the tree you expect in your data.
What do you need raw events for? Put all relevant fields into the data model and go from there.
Another problem is that if I specify the fields to be returned, in the case of large amounts of data, the speed is slower than that of direct search.
I misplaced my crystal ball, this would be so much easier if you included your searches and data samples/description.
The search statement I use is |pivot datamodel dataset SPLITROW name as new_name FILTER name is jack
.The speed is slower than index="test" name=jack
.At this time, CPU and memory usage increased sharply.
Sorry,there is a mistake.I know what to do,thanks!!!!!!!
I want to use the data model to speed up the search, so I need to return the search field, which is the result of the event, rather than the statistics. I added a part of the field to the data model, but using | tstats summariesonly = t values (company) as companies from the data model = your_model where your_model. name = Jack
does not return any event results.
Thanks,it's very fast!
But I don't need statistics, I want to return the event results. What should I do?
A single indexer like you have in your AllInOne
configuration cannot efficiently handle billions of events by itself. You need many more indexers so that the main power of Splunk (Parallel Map and Reduce) can be unleashed.
There is no reduce in your search.
You're not going to get good advice without describing your use case.
I just want to get lots of data in a short time.I added hundreds of millions of pieces of data to an indexer.I just used the simplest search: index="test" name=jack
,it's very slow.
Then I tred to build a datamodel,but only PivotTable can be generated. I want to generate search results.
What commands do I need to use to speed up my search through the datamodel?
Thanks! 🙂
"Getting data" is not a use case.
Sorry,I don't know what you means......
There's no value in listing millions of events on screen, which is what your current search does.
Describe what you actually want to achieve instead of just trolling.
Sure,I know it's no value in listing millions of events on screen.I want to use keyword search to get the required data from billions of data, and the results may only be hundreds or thousands. But usually the data base is too large, and the search always becomes very slow.
If you really just want to list events on screen, append | head 1000
to your search. Nobody's meant to page past 1000 events.
I tred to build a datamodel.It's verty fast,but only PivotTable can be generated. I want to generate search results.
What commands do I need to use to speed up my search through the datamodel?
OK,maybe that's an effective way to do it.Actually,thanks!
I want to call the interface and search recursively.For example, the key word is name = jack
.For the first time, relevant information has been searched out, such as mailbox, company, etc.Then search the company, mailbox and so on as keywords again.So go back and forth until you find all the information associated with name = jack
.
Is there a good way to optimize the search algorithm, or does Splunk have its own recursive search command?
I added more indexers and used distributed. The test data were distributed to three machines, each with five indexers and each indexer with 20,000,000 data. But there is no improvement in search speed. How can I improve it?