Above is the event, not sure why this is showing up as two different events. Anyways, I have written a splunk query according to my requirements but output is not good. I want to get rid of Service and Maintenance Start time in MST.
 
		
		
		
		
		
	
			
		
		
			
					
		Above is the event, not sure why this is showing up as two different events. Anyways, I have written a splunk query according to my requirements but output is not good. I want to get rid of Service and Maintenance Start time in MST.
Let me summarize the use case: You have ONE single log,
| Mon Oct 16 07:29:46 MST 2023 | MIME-Version: 1.0 Content-Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Pleasefind below servers which are in maintenance mode for more than 2 hours; </br></br></font> <tableborder=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TRbgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time inMST</TH></TR><TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST 2023</TH></TR> </table> <font size=3 color=black></br> ScriptPath:/amex/ansible/maintenance_mode_service</font> <font size=3 color=black></br></br>Thankyou,</br>BDP Spark Support Team</font> | 
But Splunk indexer gives you TWO events (with different time values)
| Mon Oct 16 07:31:53 MST 2023 | MIME-Version: 1.0 Content-Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Pleasefind below servers which are in maintenance mode for more than 2 hours; </br></br></font> <tableborder=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TRbgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time inMST</TH></TR> | 
| Mon Oct 16 07:29:46 MST 2023 | <TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST2023</TH></TR> </table> <font size=3 color=black></br> ScriptPath:/amex/ansible/maintenance_mode_service</font> <font size=3 color=black></br></br>Thankyou,</br>BDP Spark Support Team</font> | 
You want to use search command to combine data in these two into one table row. Is this correct?
Most importantly, you have a line break problem in ingestion. This is where you really need to fix. By default, Splunk has the habit of hunting for timestamp and use it as a clue that a new event exists. This is why the "second" event has the time Mon Oct 16 07:29:46 MST 2023 which is actually the maintenance start time, not the time of log which should be later, namely Mon Oct 16 07:31:53 MST 2023. If you do not fix line break problem, there is no end to troubles down the road no matter how many clever ways you can devise to work around it.
This said, it is possible to work around this particular log by restoring the complete log using transaction. (Warning: The workaround may break other things.)
Second, try not to capture everything by counting word breaks or even HTML tags. HTML is really the worst enemy of Splunk because HTML's semantics is totally separate from semantics of content. Always try to anchor regex on 1) content semantics, 2) HTML semantics. Here is a proposal
| transaction startswith="Script Path" endswith="MIME-Version"
| eval _time = _time + duration ``` restore actual event time; this may not be of interest ```
| rex "Cluster Name:\s*(?<ClusterName>[^<]+)"
| rex "<TR[^>]*><TH[^>]*>(?<Service>[^<]+)<\/TH><TH[^>]*>(?<MaintenanceStartTime>[^<]+)"
| table ClusterName Service MaintenanceStartTime
The two events should give you
| ClusterName | Service | MaintenanceStartTime | 
| AtWork-CIW-E1 | oozie | Mon Oct 16 07:29:46 MST 2023 | 
Here is the emulation that you can play with and compare with real data
| makeresults
| eval data=split("MIME-Version: 1.0 Content-Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2 hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Please find below servers which are in maintenance mode for more than 2 hours; </br></br></font> <table border=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TR bgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time in MST</TH></TR>
<TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST 2023</TH></TR> </table> <font size=3 color=black></br> Script Path:/amex/ansible/maintenance_mode_service</font> <font size=3 color=black></br></br>Thank you,</br>BDP Spark Support Team</font>", "
")
| mvexpand data
| eval _time = if(match(data, "Mon Oct 16 07:29:46 MST 2023"), strptime("Mon Oct 16 07:29:46 MST 2023", "%a %b %d %H:%M:%S %Z %Y"), strptime("Mon Oct 16 07:31:53 MST 2023", "%a %b %d %H:%M:%S %Z %Y"))
| rename data AS _raw
``` data emulation above ```
Do not forget: Your most important task is to fix line breaks. (There are many guides in Splunk documents, and various answers in this forum.)
Here is my Splunk query, Output is not good
rex max_match=0 ^\w+:\s+\w+\.\w+@\w+\.\w+\s+\w+:\s+\w+\-\w+\-\w+@\w+\.\w+\s+\w+\-\w+:\s+\d+\.\d+\s+\w+\-\w+:\s+\w+\s+\w+:\s+\w+\s+\-\s+(?P<Info>\w+\s+\w+\s+\w+\s+\w+\s+\w+\s+\w+\s+\d+\s+\w+)\s+\-\-\s+(?P<ClusterName>\w+\-\w+\-\w+) |rex "(?ms)^(?:[^>\\n]*>){2}(?P<Svc>\\w+)[^=\\n]*=\\d+>(?P<Maint>[^<]+)" | table Info ClusterName Svc Maint
Info ClusterName Svc Maint
| Services are in Maintenance Mode over 2 hours | AtWork-CIW-E1 | Service | Maintenance Start Time in MST | 
| oozie | Mon Oct 16 07:29:46 MST 2023 | 
In the above output, it is capturing Service and Maintenance Start time in MST in the field extractions
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		No
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		Hi @bmanikya,
could you share more sample logs?
because, as you can see in regex101.com, my regex works on the shared sample.
Ciao.
Giuseppe
I have already shared before, events are in HTML.
Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2 hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Please find below servers which are in maintenance mode for more than 2 hours; </br></br></font> <table border=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TR bgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time in MST</TH></TR><TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST 2023</TH></TR> </table> <font size=3 color=black></br>
Please check in Bold characters. I want this in table format
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		Hi @bmanikya,
to help you in a regex extraction, you should share your events in text mode (eventually using the Insert/Edit Code Sample button), highlighting the parts to extract.
Ciao.
Giuseppe
| Cluster Name: AtWork-CIW-E1 | |
| Service | Maintenance Start Time in MST | 
| oozie | Mon Oct 16 07:29:46 MST 2023 | 
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		Hi @bmanikya,
using your one sample, I can propose to yu this regex:
(?ms)\s-\s(?<Service>[^-]*).*oozie(\<[^\>]*\>){2}(?<oozie>[^\<]*)that you can test at https://regex101.com/r/tzacfN/1
If you could share more samples (always in text mode) I could verify the above regex.
Ciao.
Giuseppe
