Splunk Search
Highlighted

Need help Optimizing Search in HUNK

Builder

We are currently using MapRFS and with our restrictions on directory structure, we are having a hard time getting optimized searches with Hunk.

Basically, the search will find all the events and then just keep searching through all files.

Our restriction require us to a have a folder called current that our current hour logs go into and then at the top of the hour, it is rolled and we move the rolled file into the subdirectories based on date/time.

Our current directory structure looks like:
/mapr/mapr.oly.cequintecid.com/user/mapr/data/(sourcetype)/(host)/current/(year)/(month)/(day)/(hour)

The current hour goes into a log file in /mapr/mapr.oly.cequintecid.com/user/mapr/data/(sourcetype)/(host)/current
and then is moved at the top of the hour to the corresponding
...(year)/(month)/(day)/(hour)
folder

We had search optimization before when we were putting the current hour log file directly into the further down hour subdirectory but we cannot do this anymore due to internal restrictions.

Suggestions are welcome.

Here is our indexes.conf for the virtual index we are using:
alt text

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

Path Finder

Is the et/lt regex correct and markdown is messing it up to just .? ?

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

Builder

Thanks for pointing that out. it was a fluke of the nature of HTML in this interface. I changed it and uploaded an image of what my et/lt regex looks like with the astrick included

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

Path Finder

Below you say it doesn't "always do it" - with the current configs that you have in place it should always search the input files for which it cannot figure out the time range they belong to - ie the most recent hour of data will be always searched. Is that what you see?

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

SplunkTrust
SplunkTrust

Hi. So when you say

Basically, the search will find all the events and then just keep searching through all files.

I assume you are watching with debug or something to see that Splunk keeps looking at the files to see if they match the regex?

That is the behavior I have observed too. How would it know not to keep looking?

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

Builder

So it doesnt always do this.
Hunk can avoid this by the regex you create in specifying a timeline either in Splunk Web or in the Indexes.conf. If you look at the vix.input.1.et.regex it is (et stands for earliest time, lt for latest time)

/user/mapr/.?/.?/.?/(\d+)/(\d+)/(\d+)/(\d+)/.

and I have the format at yyyyMMddHH

So it will find that first (\d+) and identify it as the year
Second one is the month
Third is the Day
Fourth is the hour
So now Hunk can search through ONLY the directories for the time you specified. Say you only wanted to search for the last 60 mins
It will know what the HOUR of your logs are from the regex and ONLY search in the appropriate folders based on whatever hours fall into the last 60 mins.

I've seen this work before BEFORE I had a current folder in my path and now it doesnt.

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

Path Finder

Hunk will list all the files, then eliminate as many is it can by just looking at the path e.g. filter via fields extracted from path, or time range. If it cannot eliminate a file, then it will process it's contents and apply the search to the contents. Just to be clear, there are two parts to search (a) list files try to eliminate and (b) read files and process their content

EricLloyd79 - are you seeing files that shouldn't be processed be processed? If so, please provide an example

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

Builder

I am not sure how to tell which files it is searching through. All I can see is that it has found x number of files of y number of files and even though I see all the results for the time I specified in the search results, it continues to search through more files up in the range of millions. I tried turning on Debug mode for Splunk and running this and viewing the splunkd.log file to see if I can view where it is searching once all the files are found but I could not see in the logs where that information would be.

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

Path Finder

Is x(in x of y files) in the general vicinity of the files it needs to search? ie the ones that fall within earliest/latest of the search + current?

0 Karma
Highlighted

Re: Need help Optimizing Search in HUNK

Builder

Yes the events it finds are correct, that isnt the issue. The issue is that after it finds them, it continues searching through the rest of the files.

Its even more bizarre to me because I have a separate directory with a new virtual index that is nearly identical to the first one (only difference I can find is the names) and when it searches, it will find the events in the time frame asked for and then wraps up the search easy.

I realize this is very difficult to troubleshoot this sort of thing without seeing it. Any suggestions are welcome though as I will continue trying to figure out what makes one virtual index run faster than the other.

0 Karma