Hi,
I have a log like below
"12","select a.a,b.b,c from a,b where a.a = b.a group by xxxx","impala",2017-06-30T00:59:40
I am trying to parse this to extract where condition for some performance analysis.
I tried this in regex101 and got the below regex
\"\d+\",\"select\s+(.\*)\s+from\s+(.\*)\s*where\s(.\*)\s+group by\s+(.\*)\s*\",(.\*)\s*,(.\*)
Help me to implement it in SPL.
I think you have it worked out for the most part. not seeing an actual example leaves a lot of possible incorrectness, but here is what I think you are after.
| rex "\"\d+\",\"select\s+(.*)\s+from\s+(.*)\s*where\s(?<WHERECAPTURE>.*)\s+group by\s+(.*)\s*\",(.*)\s*,(.*)"
Let us know if you need more.
I think you have it worked out for the most part. not seeing an actual example leaves a lot of possible incorrectness, but here is what I think you are after.
| rex "\"\d+\",\"select\s+(.*)\s+from\s+(.*)\s*where\s(?<WHERECAPTURE>.*)\s+group by\s+(.*)\s*\",(.*)\s*,(.*)"
Let us know if you need more.
And, if you are only trying to capture the WHERE:
| rex "where\s(?<WHERECAPTURE>.*)\s"
Thanks ... And as you mentioned there are some mismatches in actual logs and Since the query was written by Humans which doesn't follow any meaningful pattern. Actually I extended my analysis to extract JOINS also.