Splunk Search

Need help in regex

bmanikya
Loves-to-Learn Lots

bmanikya_1-1697543560778.png

 

bmanikya_2-1697543592146.png

Above is the event, not sure why this is showing up as two different events. Anyways, I have written a splunk query according to my requirements but output is not good.  I want to get rid of Service and Maintenance Start time in MST.

 

 

bmanikya_0-1697543531428.png

 

 

 

Labels (3)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Above is the event, not sure why this is showing up as two different events. Anyways, I have written a splunk query according to my requirements but output is not good.  I want to get rid of Service and Maintenance Start time in MST.


Let me summarize the use case: You have ONE single log,

Mon Oct 16 07:29:46 MST 2023
MIME-Version: 1.0 Content-Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Pleasefind below servers which are in maintenance mode for more than 2 hours; </br></br></font> <tableborder=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TRbgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time inMST</TH></TR>
<TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST 2023</TH></TR> </table> <font size=3 color=black></br> ScriptPath:/amex/ansible/maintenance_mode_service</font> <font size=3 color=black></br></br>Thankyou,</br>BDP Spark Support Team</font>

But Splunk indexer gives you TWO events (with different time values)

Mon Oct 16 07:31:53 MST 2023
MIME-Version: 1.0 Content-Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Pleasefind below servers which are in maintenance mode for more than 2 hours; </br></br></font> <tableborder=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TRbgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time inMST</TH></TR>
Mon Oct 16 07:29:46 MST 2023<TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST2023</TH></TR> </table> <font size=3 color=black></br> ScriptPath:/amex/ansible/maintenance_mode_service</font> <font size=3 color=black></br></br>Thankyou,</br>BDP Spark Support Team</font>

You want to use search command to combine data in these two into one table row.  Is this correct?

Most importantly, you have a line break problem in ingestion.  This is where you really need to fix.  By default, Splunk has the habit of hunting for timestamp and use it as a clue that a new event exists.  This is why the "second" event has the time Mon Oct 16 07:29:46 MST 2023 which is actually the maintenance start time, not the time of log which should be later, namely Mon Oct 16 07:31:53 MST 2023.  If you do not fix line break problem, there is no end to troubles down the road no matter how many clever ways you can devise to work around it.

This said, it is possible to work around this particular log by restoring the complete log using transaction. (Warning: The workaround may break other things.)

Second, try not to capture everything by counting word breaks or even HTML tags.  HTML is really the worst enemy of Splunk because HTML's semantics is totally separate from semantics of content.  Always try to anchor regex on 1) content semantics, 2) HTML semantics.  Here is a proposal

 

| transaction startswith="Script Path" endswith="MIME-Version"
| eval _time = _time + duration ``` restore actual event time; this may not be of interest ```
| rex "Cluster Name:\s*(?<ClusterName>[^<]+)"
| rex "<TR[^>]*><TH[^>]*>(?<Service>[^<]+)<\/TH><TH[^>]*>(?<MaintenanceStartTime>[^<]+)"
| table ClusterName Service MaintenanceStartTime

 

The two events should give you

ClusterNameServiceMaintenanceStartTime
AtWork-CIW-E1oozieMon Oct 16 07:29:46 MST 2023

Here is the emulation that you can play with and compare with real data

 

| makeresults
| eval data=split("MIME-Version: 1.0 Content-Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2 hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Please find below servers which are in maintenance mode for more than 2 hours; </br></br></font> <table border=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TR bgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time in MST</TH></TR>
<TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST 2023</TH></TR> </table> <font size=3 color=black></br> Script Path:/amex/ansible/maintenance_mode_service</font> <font size=3 color=black></br></br>Thank you,</br>BDP Spark Support Team</font>", "
")
| mvexpand data
| eval _time = if(match(data, "Mon Oct 16 07:29:46 MST 2023"), strptime("Mon Oct 16 07:29:46 MST 2023", "%a %b %d %H:%M:%S %Z %Y"), strptime("Mon Oct 16 07:31:53 MST 2023", "%a %b %d %H:%M:%S %Z %Y"))
| rename data AS _raw
``` data emulation above ```

 

Do not forget: Your most important task is to fix line breaks. (There are many guides in Splunk documents, and various answers in this forum.)

Tags (2)

bmanikya
Loves-to-Learn Lots

Here is my Splunk query,  Output is not good

rex max_match=0 ^\w+:\s+\w+\.\w+@\w+\.\w+\s+\w+:\s+\w+\-\w+\-\w+@\w+\.\w+\s+\w+\-\w+:\s+\d+\.\d+\s+\w+\-\w+:\s+\w+\s+\w+:\s+\w+\s+\-\s+(?P<Info>\w+\s+\w+\s+\w+\s+\w+\s+\w+\s+\w+\s+\d+\s+\w+)\s+\-\-\s+(?P<ClusterName>\w+\-\w+\-\w+) |rex "(?ms)^(?:[^>\\n]*>){2}(?P<Svc>\\w+)[^=\\n]*=\\d+>(?P<Maint>[^<]+)" | table Info ClusterName Svc Maint

 

Info ClusterName Svc Maint

Services are in Maintenance Mode over 2 hoursAtWork-CIW-E1ServiceMaintenance Start Time in MST
  oozieMon Oct 16 07:29:46 MST 2023

 

In the above output, it is capturing Service and Maintenance Start time in MST in the field extractions

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @bmanikya,

does my regex work?

Ciao.

Giuseppe

0 Karma

bmanikya
Loves-to-Learn Lots

No

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @bmanikya,

could you share more sample logs?

because, as you can see in regex101.com, my regex works on the shared sample.

Ciao.

Giuseppe

 

0 Karma

bmanikya
Loves-to-Learn Lots

I have already shared before, events are in HTML.

 

Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2 hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Please find below servers which are in maintenance mode for more than 2 hours; </br></br></font> <table border=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TR bgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time in MST</TH></TR><TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST 2023</TH></TR> </table> <font size=3 color=black></br

Please check in Bold characters. I want this in table format

 

Tags (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @bmanikya,

to help you in a regex extraction, you should share your events in text mode (eventually using the Insert/Edit Code Sample button), highlighting the parts to extract.

Ciao.

Giuseppe

0 Karma

bmanikya
Loves-to-Learn Lots
MIME-Version: 1.0 Content-Disposition: inline Subject: INFO - Services are in Maintenance Mode over 2 hours -- AtWork-CIW-E1 Content-Type: text/html <font size=3 color=black>Hi Team,</br></br>Please find below servers which are in maintenance mode for more than 2 hours; </br></br></font> <table border=2> <TR bgcolor=#D6EAF8><TH colspan=2>Cluster Name: AtWork-CIW-E1</TH></TR> <TR bgcolor=#D6EAF8><TH colspan=1>Service</TH><TH colspan=1>Maintenance Start Time in MST</TH></TR>
<TR bgcolor=#FFB6C1><TH colspan=1>oozie</TH><TH colspan=1>Mon Oct 16 07:29:46 MST 2023</TH></TR> </table> <font size=3 color=black></br> Script Path:/amex/ansible/maintenance_mode_service</font> <font size=3 color=black></br></br>Thank you,</br>BDP Spark Support Team</font>                                                                                                                                                  
 
Need field extractions of the following.
 

Cluster Name: AtWork-CIW-E1

Service

Maintenance Start Time in MST

oozie

Mon Oct 16 07:29:46 MST 2023

 
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @bmanikya,

using your one sample, I can propose to yu this regex:

(?ms)\s-\s(?<Service>[^-]*).*oozie(\<[^\>]*\>){2}(?<oozie>[^\<]*)

that you can test at https://regex101.com/r/tzacfN/1

If you could share more samples (always in text mode) I could verify the above regex.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...