Hello guys, sup?
We've got this piece of log which is a MySql log and we should not change the layout, but need to extract START_DATE
, END_DATE
and NUM_ROWS
.
As follows a piece of data:
******************************************************
***************** EXECUTION HYW *****************
***************** TRTD08IIL.SQL *****************
******************************************************
*************** 26/02/2015-07:00:06 ***************
Conectting to MYSQL
Execution Data
Process Code : TRTD08IIL
Program : TRTD08IIL.SQL
Descripton : Table
Critic : E
Reference Date : 20140103
*************** 26/02/2015-07:00:06 ***************
Temp Execution Path For File: /C:/TEMP/TRTD08IIL.SQL.TRTD08IIL
*************** 26/02/2015-07:00:06 ***************
Temp Execution Path For File: /C:/TEMP/TRTD08IIL.SQL.TRTD08IIL
Table tb.User stats: [num_partitions: 0, num_files: 0, num_rows: 65931447, total_size: 0, raw_data_size: 15793929087]
65931447 Rows loaded to tab_clie
MapReduce Jobs Launched:
Job 0: Map: 18 Reduce: 4 Cumulative CPU: 608.98 sec HDFS Read: 1037601634 HDFS Write: 604987045 SUCCESS
Job 1: Map: 59 Reduce: 58 Cumulative CPU: 4337.55 sec HDFS Read: 14707612009 HDFS Write: 29413721 SUCCESS
Total MapReduce CPU Time Spent: 0 days 7 hours 59 minutes 42 seconds 920 msec
OK
Time taken: 2256.431 seconds
*************** 26/02/2015-07:37:47 ***************
Temp Execution Path For File: /C:/TEMP/TRTD08IIL.SQL.TRTD08IIL
*************** 26/02/2015-07:37:47 ***************
Temp Execution Path For File: /C:/TEMP/TRTD08IIL.SQL.TRTD08IIL
This works on regex101.com:
"(?s)\*{15}\s+(?P<START_DATE>\d\d\/\d\d\/\d{4}-\d\d:\d\d:\d\d)\s+\*{15}.*\s(?P<NUM_ROWS>\d+)\sRows loaded.*Time taken: .*?seconds.*?\*{15}\s+(?<END_DATE>\d\d\/\d\d\/\d{4}-\d\d:\d\d:\d\d)"
This works on regex101.com:
"(?s)\*{15}\s+(?P<START_DATE>\d\d\/\d\d\/\d{4}-\d\d:\d\d:\d\d)\s+\*{15}.*\s(?P<NUM_ROWS>\d+)\sRows loaded.*Time taken: .*?seconds.*?\*{15}\s+(?<END_DATE>\d\d\/\d\d\/\d{4}-\d\d:\d\d:\d\d)"
Thanks @richgalloway! Worked here!
Is data already ingested and all these lines appear as one event?
Yeah, max_event was applied with 10.000 in index preview...
rex field=_raw "*\s+(?P<Start_date>\S+)\s+"
should give you the time value.
rex field=_raw "(?P<ROLLS>num_rows:\s+\d+)"
This will get the ROLLS.
But you would want num_rows = 65931447
right? for that..
rex field=_raw "num_rows:\s+(?P<num_rows>\d+)\,"
Didn't test these. Should be good though.
Sorry about that. Been a while since I posted here. Looks like formatting got messed up.
I need to extract by regex
should be:
START_DATE = *************** 26/02/2015-07:00:06 ***************
END_DATE = *************** 26/02/2015-07:37:47 ***************
ROLLS = num_rows: 65931447