Running into an issue where a query against a virtual index errors out when it hits *.tmp files in the HDFS directory.
Is there a way to filter, or prevent the query from looking at *.tmp files as it's performing the query?
Cloudera said to perform a filtering of the files in the target input directory, to remove away any .tmp files as these are in-progress files from Flume and can get renamed during the job, causing this error.
Thx
I'm afraid I can't figure out what's going on from the info here. The only other difference I see is that the pattern that works starts with .*?, while the other two start with .?, but that really should not matter. I think you may need to contact support to have somebody go through this with you.
Just an update:
For the time capturing regex I had to set the 'Time Range' to 1 day as the we're saving logs to one folder per day (12/14, 12/15, 12/16, etc). By setting the 'Time Range' to 1 day, I can now search logs per day.
Hope this helps
Thx
Thx for the link.
I had the whitelist regex as follows: ISE.*
I then changed the regex to: (ISE.*\.(\d+))
as the Cisco ISE logs either end with . when fully written, or .tmp as the file is still be written to.
I have a different regex problem (Time capturing regex) which is driving me mad if you don't mind taking a look at.
We have three directories on HDFS:
• /LogCentral/Firewall
• /LogCentral/ISE
• /LogCentral/ WindowsEvent
I have the following regex applied to our Firewall virtual index and I can use the time picker no problem
.*?/Firewall/(\d+)-(\d+)-(\d+)/.*?)
However, applying the same format to the other two logs
.?/ISE/(\d+)-(\d+)-(\d+)/.*?)
.?/WindowsEvent/(\d+)-(\d+)-(\d+)/.*?)
I get no events at all no matter what dates I select in the time picker, yet I'm using the same format.
Tried the following regex as I got a match on regex101.com:
.+\ISE\/(\d+)-(\d+)-(\d+)
Yes when I enter that and try and run a search, it errors out:
[cdhprovider] Error while running external process, return_code=255. See search.log for more info
[cdhprovider] IOException - No input paths specified in job.
Thx
As copied here, your regexes have unbalanced parentheses. For example, ".?/ISE/(d+)-(d+)-(d+)/.?)"
has a final )
char that is not matched on the left. Is that a copying artifact, or what you're really using? If the latter, try removing the final )
.
That is a copying artifact - should be:
.*?/ISE/(\d+)-(\d+)-(\d+)/.*?
OK, then that regex looks OK to me. Can you verify that the data format is the same for the ISE index as it is for the Firewall index?
It's exact, and that's what's driving me crazy
• /LogCentral/Firewall/yyyy-MM-dd
• /LogCentral/ISE/yyyy-MM-dd
• /LogCentral/WindowsEvent/yyyy-MM-dd
and I have yyyyMMdd entered for Format