I have a search like this:
index= foo earliest=-3d |rex field=summary "(?{.*)" | spath input=json_data |stats count by Version | search Version < 30401942 |sort -Version.
it reads about 2.5 million events approximately, but it takes like 25 seconds to finish. Is this a normal time response for that amount of logs? is there any configuration in Splunk that i should check to improve performance? I'm quite new with Splunk.
thank you!
Hi guillecasco,
the first check you have to do is to verify throughput of your disks using tools like Bonnie++ and verify if it's higher or less that the required 800 iops.
Bye.
Giuseppe
You can improve the performance by 10 X times by using Splunk meta data fields. I can help you in that please contact me in fiverr or Email (hurdlej1@gmail.com)
https://www.fiverr.com/s2/affc9b7a8a
https://www.fiverr.com/s2/608e8ed73f?utm_source=CopyLink_Mobile
If you want to help just post here an answer so every splunk user looking for something similar will be able to find it.
In splunk, get rid of everything you don't need at the earliest possible time. In this case, I believe the only field you need out of the events is summary.
Therefore, you can add "| fields summary" as your first command after the initial search, and the search will speed up quite a bit.
Also, richgalloway's suggestion to eliminate version values you don't care about before the stats command is a good one.
Here is a recode you can try, and after that, there is a a description of my assumptions.
index= foo earliest=-3d summary=*
| fields summary
| rex field=summary "(?<json_data>{.*)"
| spath input=json_data
| fields Version
| search Version < 30401942
| stats count by Version
| sort 0 -Version.
The above recode should be significantly faster, based upon this interpretation of your original code -
NOTE - You can also speed it up a bit more if you know the exact path to the version data you are looking at, rather than having spath extract all information from the json when you only need the version.
edited to use sort 0 rather than sort in case more than 100 Version values were returned.
updated version
to Version
in the fields
command.
Hi guillecasco,
the first check you have to do is to verify throughput of your disks using tools like Bonnie++ and verify if it's higher or less that the required 800 iops.
Bye.
Giuseppe
I can offer some generic suggestions for improving performance.
Make your base search as specific as possible. Include everything you know about the events you want.
Narrow the time window as much as you can. Do you really need 3 days of data?
Consider moving the "search Version < 30401942" command to the left. This should reduce the number of events that are read/processed.
Examine the job inspector for insight into where time is being spent on your search.
Consider spreading your data across more indexers.
Also, perhaps we can improve on the field extraction regex? Instead of using .* what exactly are you trying to extract? Word characters only? Digits?