I have an index which processes around 10 million events per day. I did a few field extractions which had lookaheads and lookbehinds. Will this hurt my search performance with such massive volumes?
IMHO, you should avoid them because it does have an impact and it does add up. On the other hand sometimes it is unavoidable. I had a production source that generated CDRs with start and stop times. For durationful events, you should ALWAYS use the stop time. However, sometimes these records had NULL stop times and we needed to use the start time as a fallback timestamp. To do this we used a lookahead for TIME_PREFIX
which is a bad situation. Event though we had thousands of CDRs a second, we did not notice an impact when we deployed this change (and the cluster was not very large and had little extra horsepower). So in my limited experience, deploying one lookahead was unnoticeable but I am sure deploying dozens of them would have been. Do what you have to do and keep an eye on your situation so you can stay ahead of the performance curve and upscale your cluster horsepower as you add in things.
IMHO, you should avoid them because it does have an impact and it does add up. On the other hand sometimes it is unavoidable. I had a production source that generated CDRs with start and stop times. For durationful events, you should ALWAYS use the stop time. However, sometimes these records had NULL stop times and we needed to use the start time as a fallback timestamp. To do this we used a lookahead for TIME_PREFIX
which is a bad situation. Event though we had thousands of CDRs a second, we did not notice an impact when we deployed this change (and the cluster was not very large and had little extra horsepower). So in my limited experience, deploying one lookahead was unnoticeable but I am sure deploying dozens of them would have been. Do what you have to do and keep an eye on your situation so you can stay ahead of the performance curve and upscale your cluster horsepower as you add in things.
Good explanation.. I'm doing the extractions now and we expect a large increase in events in the future..Even though it's not affecting performance much right now, I don't want to hurt myself in the future
It is the same as subsearches
and transaction
. I bend over backwards to avoid using them (poor performance) but sometimes it is the only way to do it. You've gotta do what you've gotta do.
Possibly, depending on how many steps it takes to match. If it takes 50 steps probably not, but it it takes 2500 steps to match per events then, possibly. Just look at when you do a fast search vs verbose/smart. The more regexes you apply and the less effecting the regex the slower your searches.