Hi there,
I have been testing Hunk and noticed that due to the lack of pre-indexing, it relies quite a lot on proper Regexes and other sorts of filters to speed up searches.
An example of this is the use of vix.input.1.path and vix.input.1.et.* and vix.input.1.lt.* settings as illustrated below:
[hunktest]
vix.input.1.accept = \.gz$
vix.input.1.path = /test/logs/${environmentid}/...
vix.provider = test-hadoop-cluster
vix.input.1.et.format = yyyyMMddHHmmssSSSS
vix.input.1.et.offset = -3600
vix.input.1.et.regex = .*/logs/\d+/data\.(\d+).*
vix.input.1.et.timezone = GMT
vix.input.1.lt.format = yyyyMMddHHmmssSSSS
vix.input.1.lt.offset = 0
vix.input.1.lt.regex = .*/logs/\d+/data\.(\d+).*
vix.input.1.lt.timezone = GMT
While the above works great, I am facing a small complication. ${environmentid} is a numerical value that has very little meaning to the people who would be using the search heads.
I know I can use a lookup and I have configured one:
[preprocess-gzip]
LOOKUP-env_to_ids = environment_name environmentid OUTPUTNEW environment_name
I also tested the lookup and it seems it is working:
When I perform a search like index=hunktest environmentid=123 I can navigate through the matches and see the environment_name field has been created and matches the CSV contents. I can also see that just one subfolder (123) has raised matches.
However, if I try to run index=hunktest environmentname=Test or index=hunktest environmentname="Test" , upon inspecting the search.log, it seems like Hunk crawled the whole HDFS store instead of crawling just /logs/123/
Is it possible to define a lookup so that it act as a filter on search time?
... View more