 
					
				
		
Hello guys, sup?
We've got this piece of log which is a MySql log and we should not change the layout, but need to extract START_DATE, END_DATE and NUM_ROWS.
As follows a piece of data:
******************************************************
*****************  EXECUTION HYW   *****************
*****************    TRTD08IIL.SQL    *****************
******************************************************
***************   26/02/2015-07:00:06  ***************
Conectting to MYSQL
Execution Data
Process Code : TRTD08IIL
Program           : TRTD08IIL.SQL
Descripton          : Table
Critic      : E
Reference Date : 20140103
***************   26/02/2015-07:00:06  ***************
Temp Execution Path For File: /C:/TEMP/TRTD08IIL.SQL.TRTD08IIL
***************   26/02/2015-07:00:06  ***************
Temp Execution Path For File: /C:/TEMP/TRTD08IIL.SQL.TRTD08IIL
Table tb.User stats: [num_partitions: 0, num_files: 0, num_rows: 65931447, total_size: 0, raw_data_size: 15793929087]
65931447 Rows loaded to tab_clie
MapReduce Jobs Launched:
Job 0: Map: 18  Reduce: 4   Cumulative CPU: 608.98 sec   HDFS Read: 1037601634 HDFS Write: 604987045 SUCCESS
Job 1: Map: 59  Reduce: 58   Cumulative CPU: 4337.55 sec   HDFS Read: 14707612009 HDFS Write: 29413721 SUCCESS
Total MapReduce CPU Time Spent: 0 days 7 hours 59 minutes 42 seconds 920 msec
OK
Time taken: 2256.431 seconds
***************   26/02/2015-07:37:47  ***************
Temp Execution Path For File: /C:/TEMP/TRTD08IIL.SQL.TRTD08IIL
***************   26/02/2015-07:37:47  ***************
Temp Execution Path For File: /C:/TEMP/TRTD08IIL.SQL.TRTD08IIL
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		This works on regex101.com:
"(?s)\*{15}\s+(?P<START_DATE>\d\d\/\d\d\/\d{4}-\d\d:\d\d:\d\d)\s+\*{15}.*\s(?P<NUM_ROWS>\d+)\sRows loaded.*Time taken: .*?seconds.*?\*{15}\s+(?<END_DATE>\d\d\/\d\d\/\d{4}-\d\d:\d\d:\d\d)"
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		This works on regex101.com:
"(?s)\*{15}\s+(?P<START_DATE>\d\d\/\d\d\/\d{4}-\d\d:\d\d:\d\d)\s+\*{15}.*\s(?P<NUM_ROWS>\d+)\sRows loaded.*Time taken: .*?seconds.*?\*{15}\s+(?<END_DATE>\d\d\/\d\d\/\d{4}-\d\d:\d\d:\d\d)"
 
					
				
		
Thanks @richgalloway! Worked here!
 
					
				
		
Is data already ingested and all these lines appear as one event?
 
					
				
		
Yeah, max_event was applied with 10.000 in index preview...
 
					
				
		
rex field=_raw "*\s+(?P<Start_date>\S+)\s+"
should give you the time value.
rex field=_raw "(?P<ROLLS>num_rows:\s+\d+)"
This will get the ROLLS.
But you would want num_rows = 65931447 right? for that.. 
rex field=_raw "num_rows:\s+(?P<num_rows>\d+)\,"
Didn't test these. Should be good though.
 
					
				
		
Sorry about that. Been a while since I posted here. Looks like formatting got messed up.
 
					
				
		
I need to extract by regex 
should be:
START_DATE =  ***************   26/02/2015-07:00:06  ***************
END_DATE =  ***************   26/02/2015-07:37:47  ***************
ROLLS = num_rows: 65931447
