I'm trying to search for specific words inside the last entry added to a paragraph, where each entry/addition to the paragraph is time & date stamped.
For example:
Paragraph = "25.12.2019 07:24:06 UTC Initial text entry 25.12.2019 09:50:52 UTC Should this be cancelled? No additional information found 26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"
I want to catalogue paragraphs that have the term 'cancelled' in them but only if the term is in the last entry in the paragraph. As you can see, the word 'cancelled' is in the middle of the paragraph following the entry on 25.12.2019 09:50:52 UTC and also in the last entry, so this would be catalogued as a "cancelled" paragraph in what I'm trying to do. There are several paragraphs I Have to search in this way and I plan to search for other terms aside from 'cancelled' once I figure out how to search on only the last entry in the paragraph rather than the whole paragraph.
| makeresults
| eval Paragraph="25.12.2019 07:24:06 UTC Initial text entry
25.12.2019 09:50:52 UTC Should this be cancelled?
No additional information found
26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"
| appendpipe [ eval Paragraph= "25.12.2019 07:24:06 UTC Initial text entry
25.12.2019 09:50:52 UTC Should this be cancelled?
No additional information found
26.12.2019 05:55:51 UTC No issues from this machine today, this should be denied"]
| rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"
| eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2)
| where match(lastMsg,"cancelled")
use rex
and where
Sadly, this doesn't do what I need. This will find the word canceled ANYWHERE in the paragraph, I need to only find paragraphs where the word I'm looking for is in the last timestamped entry of the paragraph.
For example:
Paragraph example 1
"25.12.2019 07:24:06 UTC Initial text entry
25.12.2019 09:50:52 UTC Should this be cancelled?
No additional information found
26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"
Paragraph example 2
"25.12.2019 07:24:06 UTC Initial text entry
25.12.2019 09:50:52 UTC Should this be cancelled?
No additional information found
26.12.2019 05:55:51 UTC No issues from this machine today, this should be denied"
Example 2 has the word cancelled in it but not in the last timestamped entry, so this paragraph should NOT be counted.
Example 1 has the word cancelled in the last timestamped entry, so this paragraph SHOULD be counted.
I see, my answer is updated.
Thank you, that looks very promising and I'm eager to see if it works, however I'm not sure how to implement your answer into my current search as I'm new to splunk and not that bright, could you please advise how I can implement your answer if my search text looks like this:
source="maint_log.csv" host="Maint log data" sourcetype="csv" | eval Cancelled=if(match(Paragraph, "cancelled|Cancelled"), 1, 0) | table Cancelled "maintenance ID" Paragraph
source="maint_log.csv" host="Maint log data" sourcetype="csv"
| rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"
| eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2)
| eval Cancelled=if(match(lastMsg, "(?i)cancelled"), 1, 0)
| table Cancelled "maintenance ID" Paragraph
Thank you, still not quite there, here's a screenshot:
In that screenshot, the 2nd and 1st events are not flagged as "Cancelled" but should be.
change | eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2)
args appropriately.
reference: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval
Unfortunately, there is no limit to the amount of timestamps in the paragraph entries and different paragraphs can have different amount of timestamps. So I can't always predict how many are going to be present and set the args of mvindex accordingly, is there no way to achieve this?
| eval lastTimestamp=mvindex(timestamp,-1), lastMsg=mvindex(msg,-1)
How is this?
getting closer! That worked against the data in the screenshots, however it seems the formatting of the real data is causing issues with the rex command.
After each timestamp in the paragraph, there is the name of the person that input the entry and their username in brackets ().
EG:
25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry
25.12.2019 09:50:52 UTC Andrew Nelson (anelson1) Should this be cancelled?
No additional information found
26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today, this should be denied
It seems like the parentheses () is causing the regex used in the rex command to not return the full last message in the paragraph but rather just the name and username of the last message.
So, using
| rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"
msg will be
Andrew nelson (anelson1)
Andrew nelson (anelson1)
Andrew nelson (anelson1)
for the example paragraph above in this comment.
Looks like I'd need to change
(?<msg>.+)
to keep looking for text after the user name in parentheses, but I don't know how...any ideas?
| rex max_match=0 field=Paragraph "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+?)(?=\d{2}|$)"
No luck, same result.
So using:
| rex max_match=0 field=Paragraph "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+?)(?=\d{2}|$)"
| eval lastTimestamp=mvindex(timestamp,-1), lastMsg=mvindex(msg,-1)
lastMsg is still just the name and username of the last entry without the text after the )
lastTimestamp is the timestamp of the last message
so it is retrieving the last entry to the paragraph correctly it seems, but only returning the name and username instead of the whole text.
The formatting of the paragraph entries is always:
dd.mm.yyyy hh:mm:ss UTC <any number of characters> (<any number of characters>) <any number of characters>
I need the last section after the first ')' to search for keywords in
I see, your sample is wrong. there is extra \n
.
your pic's Paragraph
is single text. but, your sample is with \n
I can't make query wrong sample and wrong information.
good luck.
Yes, the pic looks like single text because that's how Splunk displays it in the search results, I Don't know why the search results in Splunk don't show the new lines.
The formatting of the paragraph entries is always:
dd.mm.yyyy hh:mm:ss UTC <letters A-Z> (<letters A-Z and Numbers 0-9>) <any characters with no limit>
\n can be one of those characters following the first closing parentheses )
Additionally the text after the username in parentheses () can be anything, it can include any number of characters and spaces
Still looking for help on this if anyone has any ideas, please.
More information needed.
Can the term appear anywhere in the last entry of the paragraph?
What constitutes a paragraph (is it the entire event), and what constitutes an entry (I assume it is a part of the event, but what delineates one entry from another)?
"Can the term appear anywhere in the last entry of the paragraph?"
Yes
"What constitutes a paragraph (is it the entire event), and what constitutes an entry (I assume it is a part of the event, but what delineates one entry from another)?"
I'm new to Splunk so I'm not sure what an "event" is....
Each entry is delineated by another field called maintenance ID.
The data comes from a csv file which when opened with excel has two columns "maintenance ID" and "Paragraph".
An event is any data item that comes back from a splunk search. In most cases that would be equivalent to a log entry, whether it is a single line or multiple lines that are part of a single log message.
What does the "entry" (or event) look like when you search the data? Can you provide a "sanitized" event that can show what you full event looks like and if there are fields that are extracted from that event?
Ok thank you.
The data in splunk after this search:
source="maint_log.csv" host="Maint log data" sourcetype="csv" | eval Cancelled=if(match(Paragraph, "cancelled|Cancelled"), 1, 0) | table Cancelled "maintenance ID" Paragraph
returns 3 columns
First one is Cancelled, with just 1's and 0's,
Second one is maintenance ID, the values being a single number (8 chars long)
Third one is the Paragraph, a long block of text usually with multiple date/timestamps in it
so there are in this case 133 rows returned from that search as that's how many lines the csv contains too.