Running into an issue where a query against a virtual index errors out when it hits *.tmp files in the HDFS directory.
Is there a way to filter, or prevent the query from looking at *.tmp files as it's performing the query?
Cloudera said to perform a filtering of the files in the target input directory, to remove away any .tmp files as these are in-progress files from Flume and can get renamed during the job, causing this error.
I'm afraid I can't figure out what's going on from the info here. The only other difference I see is that the pattern that works starts with .*?, while the other two start with .?, but that really should not matter. I think you may need to contact support to have somebody go through this with you.
Just an update:
For the time capturing regex I had to set the 'Time Range' to 1 day as the we're saving logs to one folder per day (12/14, 12/15, 12/16, etc). By setting the 'Time Range' to 1 day, I can now search logs per day.
Hope this helps
Thx for the link.
I had the whitelist regex as follows:
I then changed the regex to:
as the Cisco ISE logs either end with . when fully written, or .tmp as the file is still be written to.
I have a different regex problem (Time capturing regex) which is driving me mad if you don't mind taking a look at.
We have three directories on HDFS:
• /LogCentral/ WindowsEvent
I have the following regex applied to our Firewall virtual index and I can use the time picker no problem
However, applying the same format to the other two logs
I get no events at all no matter what dates I select in the time picker, yet I'm using the same format.
Tried the following regex as I got a match on regex101.com:
Yes when I enter that and try and run a search, it errors out:
[cdhprovider] Error while running external process, return_code=255. See search.log for more info [cdhprovider] IOException - No input paths specified in job.
As copied here, your regexes have unbalanced parentheses. For example,
".?/ISE/(d+)-(d+)-(d+)/.?)" has a final
) char that is not matched on the left. Is that a copying artifact, or what you're really using? If the latter, try removing the final
It's exact, and that's what's driving me crazy
and I have yyyyMMdd entered for Format