Splunk Search

Regex to match part of a multiline string delimited by timestamps

anelson1
New Member

I'm searching through several long blocks of free text (from a csv file uploaded into splunk) and I'm interested in the last entry in each long block of text (each entry is time stamped) so in my search expression I am using this code at the moment:

rex max_match=0 field=Paragraph "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+?)(?=\d{2}|$)"
| eval lastTimestamp=mvindex(timestamp,-1), lastMsg=mvindex(msg,-1)

However using that code, I'm finding that the field 'lastMsg' does not contain the full text of the last entry but rather stops after it reaches a closing parenthesis ")" present in every entry (the one at the end of the username - see below format/example).

The format of each entry is:

dd.mm.yyyy hh:mm:ss UTC <first name of mechanic> <surname of mechanic> (<username of the mechanic, which is a bunch of letters followed by 1 or 2 numbers between 0 and 9) <any amount of characters with no limit, including new lines, bullet chars copied from Word, multiple spaces, it's a free text field so it can be anything>

And there can be 1 or more of those in each block of text I'm searching through, also note that the symbols "<" and ">" are not part of the format, I just used them there to specify different sections of each entry.

Example of a long block of text I'm sifting through:

25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled? 

  I ask as there's no additional information found 
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today

this can be cancelled

So using the search expression above with that example I get lastMsg =

26.12.2019 05:55:51 UTC Andrew Nelson (anelson1)

Where I need it to be

26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today

this can be cancelled

Hope that makes some sense...can someone please help?

Tags (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

The rex command applies only to the current event so there's no need to check for the start of the next event using (?=\d{2}). Conversely, we can assume data ends with the event. Therefore, this regex should work for your example events.

(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<user>.*?\))\s(?<msg>.+)
---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The rex command applies only to the current event so there's no need to check for the start of the next event using (?=\d{2}). Conversely, we can assume data ends with the event. Therefore, this regex should work for your example events.

(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<user>.*?\))\s(?<msg>.+)
---
If this reply helps you, Karma would be appreciated.
0 Karma

anelson1
New Member

This did not work unfortunately.
It returned the whole block of text after the first timestamp where we need the text after the last timestamp.

Regarding (?=\d{2}) I think this was to stop searching when the next timestamp was reached, sadly this doesn't allow other digits to exist in the text between timestamps though.

To add extra clarity, each event consists of multiple lines of text with 1 or more timestamps.

The example I gave above:

25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled? 

   I ask as there's no additional information found 
 26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
 this can be cancelled

is 1 event.

0 Karma

anelson1
New Member

Another event example:

08.01.2019 17:01:59 UTC Simon Bolivar (SBOLIV8)
  Define work requirements; including known materials, specialist labour,
  special tools that are required ...
  GPS number 2 showing offline, please investigate and rectify.
  ----------------------------------------------------------------
  Define any equipment constraints ...

  ----------------------------------------------------------------
  Define any other information to support work requirements ...

  ----------------------------------------------------------------
  09.01.2019 01:24:20 UTC Shayne Warne (Warnes9) Phone +9126
  Night shift technicians attended machin3e - reflashed GPS App files.
  Functional.
  2 x technicians, 2 x hours labour."

In this event, I need lastMsg to be

Shayne Warne (Warnes9) Phone +9126
      Night shift technicians attended machin3e - reflashed GPS App files.
      Functional.
      2 x technicians, 2 x hours labour.

and so on

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I misunderstood the part about all lines being a single event. Try looking at it from another angle. If we can't extract the last part, let's delete everything except the last part.

See this run-anywhere example:

| makeresults | eval paragraph="25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled? 

  I ask as there's no additional information found 
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
this can be cancelled"
| eval lastMsg=paragraph
| rex field=lastMsg mode=sed "s/(?ms)\d{2}\.\d{2}\.\d{4}\s\S+\sUTC\s.+?(?=\d{2}\.)//g"
| rex field=lastMsg "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+)"
---
If this reply helps you, Karma would be appreciated.
0 Karma

anelson1
New Member

This is really close! Almost a 100% success rate, found a few exceptions

For this event:

07.12.2019 02:01:17 UTC Ricky Martin (MARTR9) Phone +61265
  XX5041 Intermittant GPS Faults
  reterminate TNC connectors
  07.12.2019 16:21:03 UTC Tyson Mike (TYSONM1) Phone +6145
   Called in truck this is the truck from last swing with GPS issues.
  Didn’t reterminate TNC as all looked fine and was only done last week.
  Pulled power from screen. Found network cable removed from switch. Can
   call this truck in and change out the screen
  08.12.2019 04:06:41 UTC Michael Bouble (BOUBLE19)
  Went through the notifications and this is the machine that has had all
  connections reterminated and the GPS receiver and antenna replaced.
  425168181 is raised for a replacement as communications from the
  components were degraded.
  Job planned for Wk02."

lastMsg =

02.

Which is incorrect, it should've been:

08.12.2019 04:06:41 UTC Michael Bouble (BOUBLE19)
      Went through the notifications and this is the machine that has had all
      connections reterminated and the GPS receiver and antenna replaced.
      425168181 is raised for a replacement as communications from the
      components were degraded.
      Job planned for Wk02.

Another event which didn't parse correctly looks like this:

"24.10.2019 00:29:01 UTC Sam Jackson (JACKSJ0) Phone +6184
  Brief description of what occurred (e.g. basic sequence of events and
  impacts, avoiding the use of personnel names) ...
  XX6583 is experiencing loss of GPS to XX407. Network comms are also
  dropping intermittently.

  ----------------------------------------------------------------
  Immediate action taken ...
  Please inspect onboard cabling/hardware for GPS faults and comms issues.
  ----------------------------------------------------------------
  Define work requirements; including known materials, specialist labour,
  special tools that are required ...

  ----------------------------------------------------------------
  Define any equipment constraints ...

  ----------------------------------------------------------------
  Define any other information to support work requirements ...
 XXX0601833
  ----------------------------------------------------------------
  24.10.2019 08:55:07 UTC Myles Robins (ROBIM9)
  Noti# - 424898647 - XX4183 - Loss of GPS - Completed - Tested GPS cable
  going into the antenna and under stress the cable snapped away from the
  connector. Re-terminated the connector, tested for voltage and got 5.5V
  at the end of the connector. Before handing over to driver it was found
  that the 2-way antenna was severely damaged, replaced the antenna and
  re-terminated the cable into the antenna, tested all good afterwards.
  Handed back to Control.
  Please order # - 10885952 - HBC2 tube
  Please order # - 10836784 - Antenna W/Coax
  Please order # - 10844533 - Antenna Bracket
  Please order # - 10552699 - Ethernet Connector"

For that event lastMsg was:

07. Network comms are also
      dropping intermittently.

      ----------------------------------------------------------------
      Immediate action taken ...
      Please inspect onboard cabling/hardware for GPS faults and comms issues.
      ----------------------------------------------------------------
      Define work requirements; including known materials, specialist labour,
      special tools that are required ...

      ----------------------------------------------------------------
      Define any equipment constraints ...

      ----------------------------------------------------------------
      Define any other information to support work requirements ...
     XXX0601833
      ----------------------------------------------------------------
      24.10.2019 08:55:07 UTC Myles Robins (ROBIM9)
      Noti# - 424898647 - XX4183 - Loss of GPS - Completed - Tested GPS cable
      going into the antenna and under stress the cable snapped away from the
      connector. Re-terminated the connector, tested for voltage and got 5.5V
      at the end of the connector. Before handing over to driver it was found
      that the 2-way antenna was severely damaged, replaced the antenna and
      re-terminated the cable into the antenna, tested all good afterwards.
      Handed back to Control.
      Please order # - 10885952 - HBC2 tube
      Please order # - 10836784 - Antenna W/Coax
      Please order # - 10844533 - Antenna Bracket
      Please order # - 10552699 - Ethernet Connector"

It seems the regex is picking up the next occurrence of 2 digits :(, anyway around this?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Perhaps this:

| makeresults | eval paragraph="25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
 25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled? 

   I ask as there's no additional information found 
 26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
 this can be cancelled"
| eval lastMsg=paragraph
| rex field=lastMsg mode=sed "s/(?ms)\d{2}\.\d{2}\.\d{4}\s\S+\sUTC\s.+?(?=\s\d{2}\.)//g"
| rex field=lastMsg "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+)"
---
If this reply helps you, Karma would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The forum stripped my formatting. Please try the revised comment.

---
If this reply helps you, Karma would be appreciated.
0 Karma

anelson1
New Member

Even better now, I could only find one instance of a failed extraction, for the following event:

"19.07.2019 10:32:47 UTC Brendan Nelson (NELSB9)
  Brief description of what occurred (e.g. basic sequence of events and
  impacts, avoiding the use of personnel names):

  -Driver has reported during their prestart that the GPS signal
  constantly drops out.

  ----------------------------------------------------------------
  Immediate action taken:


  Escalated with the supervisor
  ----------------------------------------------------------------
  Define work requirements; including known materials, specialist labour,
  special tools that are required:



  1 Fitter @2hrs to investigate and report
  ----------------------------------------------------------------
  Define any equipment constraints:


  ----------------------------------------------------------------
  Define any other information to support work requirements:



  ---------------------------------------------------------------
  20.07.2019 14:23:10 UTC Myles Truman (TRUMM3)
  XCT8067 - Monitored the comms of the machine through the night via network
  connection and Diagnostics tool, see attached images that
  prove machine maintained comms of around 98% when checked at 08:44pm
  20.07.2019 21:09:37 UTC Myles Truman (TRUMM3)
  XCT8067 - Monitored the comms of the machine through the night via network
  connection and Diagnostics tool, see attached images that
  prove machine maintained comms of around 98% when checked at 05:00am.
  20.07.2019 21:10:45 UTC Myles Hium (HIUMM3)
  XCT8067 - Machine has been inside BH7027 AMA all shift and when the
  machine went down they left it parked inside the AMA so we could not
  actually get on the machine but we monitored it remotely all night and
  apart from a drop in comms every now and then, there were no major
  issues found.
  23.07.2019 07:08:25 UTC Dallas Hyuth (HYUTD9)
  Monitored again throughout the night and again did not hear anything
  about any Comms/GPS issues with this machine. Pretty much maintained
  Precision 96% of the time in for the night. Lowest I saw
  it drop was to 82.6% Fixed and 84.9% precision."

For that event lastMsg was:

82.6% Fixed and 84.9% precision.

I'll keep scanning the other events, but so far this is almost perfect!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I may have to start charging an hourly rate. 🙂

| makeresults | eval paragraph="25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
  25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled? 

    I ask as there's no additional information found 
  26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
  this can be cancelled"
 | eval lastMsg=paragraph
 | rex field=lastMsg mode=sed "s/(?ms)\d{2}\.\d{2}\.\d{4}\s\S+\sUTC\s.+?(?=^\s+\d{2}\.)//g"
 | rex field=lastMsg "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+)"
---
If this reply helps you, Karma would be appreciated.
0 Karma

anelson1
New Member

Perfect, thank you very much

0 Karma

anelson1
New Member

This didn't work unfortunately, Splunk came back with the following error:

"Error in 'rex' command: Encountered the following error while compiling the regex '(?ms)(?\d{2}.\d{2}.\d{4}\s\S+\sUTC)\s(?.+)': Regex: unrecognized character after (? or (?-"

0 Karma
Get Updates on the Splunk Community!

Introducing Ingest Actions: Filter, Mask, Route, Repeat

WATCH NOW Ingest Actions (IA) is the best new way to easily filter, mask and route your data in Splunk® ...

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...