I'm searching through several long blocks of free text (from a csv file uploaded into splunk) and I'm interested in the last entry in each long block of text (each entry is time stamped) so in my search expression I am using this code at the moment:
rex max_match=0 field=Paragraph "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+?)(?=\d{2}|$)"
| eval lastTimestamp=mvindex(timestamp,-1), lastMsg=mvindex(msg,-1)
However using that code, I'm finding that the field 'lastMsg' does not contain the full text of the last entry but rather stops after it reaches a closing parenthesis ")" present in every entry (the one at the end of the username - see below format/example).
The format of each entry is:
dd.mm.yyyy hh:mm:ss UTC <first name of mechanic> <surname of mechanic> (<username of the mechanic, which is a bunch of letters followed by 1 or 2 numbers between 0 and 9) <any amount of characters with no limit, including new lines, bullet chars copied from Word, multiple spaces, it's a free text field so it can be anything>
And there can be 1 or more of those in each block of text I'm searching through, also note that the symbols "<" and ">" are not part of the format, I just used them there to specify different sections of each entry.
Example of a long block of text I'm sifting through:
25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled?
I ask as there's no additional information found
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
this can be cancelled
So using the search expression above with that example I get lastMsg =
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1)
Where I need it to be
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
this can be cancelled
Hope that makes some sense...can someone please help?
The rex
command applies only to the current event so there's no need to check for the start of the next event using (?=\d{2})
. Conversely, we can assume data ends with the event. Therefore, this regex should work for your example events.
(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<user>.*?\))\s(?<msg>.+)
The rex
command applies only to the current event so there's no need to check for the start of the next event using (?=\d{2})
. Conversely, we can assume data ends with the event. Therefore, this regex should work for your example events.
(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<user>.*?\))\s(?<msg>.+)
This did not work unfortunately.
It returned the whole block of text after the first timestamp where we need the text after the last timestamp.
Regarding (?=\d{2}) I think this was to stop searching when the next timestamp was reached, sadly this doesn't allow other digits to exist in the text between timestamps though.
To add extra clarity, each event consists of multiple lines of text with 1 or more timestamps.
The example I gave above:
25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled?
I ask as there's no additional information found
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
this can be cancelled
is 1 event.
Another event example:
08.01.2019 17:01:59 UTC Simon Bolivar (SBOLIV8)
Define work requirements; including known materials, specialist labour,
special tools that are required ...
GPS number 2 showing offline, please investigate and rectify.
----------------------------------------------------------------
Define any equipment constraints ...
----------------------------------------------------------------
Define any other information to support work requirements ...
----------------------------------------------------------------
09.01.2019 01:24:20 UTC Shayne Warne (Warnes9) Phone +9126
Night shift technicians attended machin3e - reflashed GPS App files.
Functional.
2 x technicians, 2 x hours labour."
In this event, I need lastMsg to be
Shayne Warne (Warnes9) Phone +9126
Night shift technicians attended machin3e - reflashed GPS App files.
Functional.
2 x technicians, 2 x hours labour.
and so on
I misunderstood the part about all lines being a single event. Try looking at it from another angle. If we can't extract the last part, let's delete everything except the last part.
See this run-anywhere example:
| makeresults | eval paragraph="25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled?
I ask as there's no additional information found
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
this can be cancelled"
| eval lastMsg=paragraph
| rex field=lastMsg mode=sed "s/(?ms)\d{2}\.\d{2}\.\d{4}\s\S+\sUTC\s.+?(?=\d{2}\.)//g"
| rex field=lastMsg "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+)"
This is really close! Almost a 100% success rate, found a few exceptions
For this event:
07.12.2019 02:01:17 UTC Ricky Martin (MARTR9) Phone +61265
XX5041 Intermittant GPS Faults
reterminate TNC connectors
07.12.2019 16:21:03 UTC Tyson Mike (TYSONM1) Phone +6145
Called in truck this is the truck from last swing with GPS issues.
Didn’t reterminate TNC as all looked fine and was only done last week.
Pulled power from screen. Found network cable removed from switch. Can
call this truck in and change out the screen
08.12.2019 04:06:41 UTC Michael Bouble (BOUBLE19)
Went through the notifications and this is the machine that has had all
connections reterminated and the GPS receiver and antenna replaced.
425168181 is raised for a replacement as communications from the
components were degraded.
Job planned for Wk02."
lastMsg =
02.
Which is incorrect, it should've been:
08.12.2019 04:06:41 UTC Michael Bouble (BOUBLE19)
Went through the notifications and this is the machine that has had all
connections reterminated and the GPS receiver and antenna replaced.
425168181 is raised for a replacement as communications from the
components were degraded.
Job planned for Wk02.
Another event which didn't parse correctly looks like this:
"24.10.2019 00:29:01 UTC Sam Jackson (JACKSJ0) Phone +6184
Brief description of what occurred (e.g. basic sequence of events and
impacts, avoiding the use of personnel names) ...
XX6583 is experiencing loss of GPS to XX407. Network comms are also
dropping intermittently.
----------------------------------------------------------------
Immediate action taken ...
Please inspect onboard cabling/hardware for GPS faults and comms issues.
----------------------------------------------------------------
Define work requirements; including known materials, specialist labour,
special tools that are required ...
----------------------------------------------------------------
Define any equipment constraints ...
----------------------------------------------------------------
Define any other information to support work requirements ...
XXX0601833
----------------------------------------------------------------
24.10.2019 08:55:07 UTC Myles Robins (ROBIM9)
Noti# - 424898647 - XX4183 - Loss of GPS - Completed - Tested GPS cable
going into the antenna and under stress the cable snapped away from the
connector. Re-terminated the connector, tested for voltage and got 5.5V
at the end of the connector. Before handing over to driver it was found
that the 2-way antenna was severely damaged, replaced the antenna and
re-terminated the cable into the antenna, tested all good afterwards.
Handed back to Control.
Please order # - 10885952 - HBC2 tube
Please order # - 10836784 - Antenna W/Coax
Please order # - 10844533 - Antenna Bracket
Please order # - 10552699 - Ethernet Connector"
For that event lastMsg was:
07. Network comms are also
dropping intermittently.
----------------------------------------------------------------
Immediate action taken ...
Please inspect onboard cabling/hardware for GPS faults and comms issues.
----------------------------------------------------------------
Define work requirements; including known materials, specialist labour,
special tools that are required ...
----------------------------------------------------------------
Define any equipment constraints ...
----------------------------------------------------------------
Define any other information to support work requirements ...
XXX0601833
----------------------------------------------------------------
24.10.2019 08:55:07 UTC Myles Robins (ROBIM9)
Noti# - 424898647 - XX4183 - Loss of GPS - Completed - Tested GPS cable
going into the antenna and under stress the cable snapped away from the
connector. Re-terminated the connector, tested for voltage and got 5.5V
at the end of the connector. Before handing over to driver it was found
that the 2-way antenna was severely damaged, replaced the antenna and
re-terminated the cable into the antenna, tested all good afterwards.
Handed back to Control.
Please order # - 10885952 - HBC2 tube
Please order # - 10836784 - Antenna W/Coax
Please order # - 10844533 - Antenna Bracket
Please order # - 10552699 - Ethernet Connector"
It seems the regex is picking up the next occurrence of 2 digits :(, anyway around this?
Perhaps this:
| makeresults | eval paragraph="25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled?
I ask as there's no additional information found
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
this can be cancelled"
| eval lastMsg=paragraph
| rex field=lastMsg mode=sed "s/(?ms)\d{2}\.\d{2}\.\d{4}\s\S+\sUTC\s.+?(?=\s\d{2}\.)//g"
| rex field=lastMsg "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+)"
The forum stripped my formatting. Please try the revised comment.
Even better now, I could only find one instance of a failed extraction, for the following event:
"19.07.2019 10:32:47 UTC Brendan Nelson (NELSB9)
Brief description of what occurred (e.g. basic sequence of events and
impacts, avoiding the use of personnel names):
-Driver has reported during their prestart that the GPS signal
constantly drops out.
----------------------------------------------------------------
Immediate action taken:
Escalated with the supervisor
----------------------------------------------------------------
Define work requirements; including known materials, specialist labour,
special tools that are required:
1 Fitter @2hrs to investigate and report
----------------------------------------------------------------
Define any equipment constraints:
----------------------------------------------------------------
Define any other information to support work requirements:
---------------------------------------------------------------
20.07.2019 14:23:10 UTC Myles Truman (TRUMM3)
XCT8067 - Monitored the comms of the machine through the night via network
connection and Diagnostics tool, see attached images that
prove machine maintained comms of around 98% when checked at 08:44pm
20.07.2019 21:09:37 UTC Myles Truman (TRUMM3)
XCT8067 - Monitored the comms of the machine through the night via network
connection and Diagnostics tool, see attached images that
prove machine maintained comms of around 98% when checked at 05:00am.
20.07.2019 21:10:45 UTC Myles Hium (HIUMM3)
XCT8067 - Machine has been inside BH7027 AMA all shift and when the
machine went down they left it parked inside the AMA so we could not
actually get on the machine but we monitored it remotely all night and
apart from a drop in comms every now and then, there were no major
issues found.
23.07.2019 07:08:25 UTC Dallas Hyuth (HYUTD9)
Monitored again throughout the night and again did not hear anything
about any Comms/GPS issues with this machine. Pretty much maintained
Precision 96% of the time in for the night. Lowest I saw
it drop was to 82.6% Fixed and 84.9% precision."
For that event lastMsg was:
82.6% Fixed and 84.9% precision.
I'll keep scanning the other events, but so far this is almost perfect!
I may have to start charging an hourly rate. 🙂
| makeresults | eval paragraph="25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry. Please look at this machine asap.
25.12.2019 09:50:52 UTC Amanda Nelson (anelson78) Should this be cancelled?
I ask as there's no additional information found
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today
this can be cancelled"
| eval lastMsg=paragraph
| rex field=lastMsg mode=sed "s/(?ms)\d{2}\.\d{2}\.\d{4}\s\S+\sUTC\s.+?(?=^\s+\d{2}\.)//g"
| rex field=lastMsg "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+)"
Perfect, thank you very much
This didn't work unfortunately, Splunk came back with the following error:
"Error in 'rex' command: Encountered the following error while compiling the regex '(?ms)(?\d{2}.\d{2}.\d{4}\s\S+\sUTC)\s(?.+)': Regex: unrecognized character after (? or (?-"