How do you search part of a text field (delimited ...

anelson1 · ‎04-18-2020

I'm trying to search for specific words inside the last entry added to a paragraph, where each entry/addition to the paragraph is time & date stamped.

For example:
Paragraph = "25.12.2019 07:24:06 UTC Initial text entry 25.12.2019 09:50:52 UTC Should this be cancelled? No additional information found 26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"

I want to catalogue paragraphs that have the term 'cancelled' in them but only if the term is in the last entry in the paragraph. As you can see, the word 'cancelled' is in the middle of the paragraph following the entry on 25.12.2019 09:50:52 UTC and also in the last entry, so this would be catalogued as a "cancelled" paragraph in what I'm trying to do. There are several paragraphs I Have to search in this way and I plan to search for other terms aside from 'cancelled' once I figure out how to search on only the last entry in the paragraph rather than the whole paragraph.

to4kawa · ‎04-25-2020

| makeresults
| eval Paragraph="25.12.2019 07:24:06 UTC Initial text entry
25.12.2019 09:50:52 UTC Should this be cancelled?     
No additional information found 
26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"
| appendpipe [ eval Paragraph= "25.12.2019 07:24:06 UTC Initial text entry 
25.12.2019 09:50:52 UTC Should this be cancelled? 

No additional information found 
26.12.2019 05:55:51 UTC No issues from this machine today, this should be denied"]
| rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"
| eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2)
| where match(lastMsg,"cancelled")

use rex and where

anelson1 · ‎04-26-2020

Sadly, this doesn't do what I need. This will find the word canceled ANYWHERE in the paragraph, I need to only find paragraphs where the word I'm looking for is in the last timestamped entry of the paragraph.

For example:
Paragraph example 1

    "25.12.2019 07:24:06 UTC Initial text entry 
    25.12.2019 09:50:52 UTC Should this be cancelled?     
    No additional information found 
    26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"

Paragraph example 2

"25.12.2019 07:24:06 UTC Initial text entry 
25.12.2019 09:50:52 UTC Should this be cancelled? 

No additional information found 
26.12.2019 05:55:51 UTC No issues from this machine today, this should be denied"

Example 2 has the word cancelled in it but not in the last timestamped entry, so this paragraph should NOT be counted.
Example 1 has the word cancelled in the last timestamped entry, so this paragraph SHOULD be counted.

to4kawa · ‎04-26-2020

I see, my answer is updated.

anelson1 · ‎04-26-2020

Thank you, that looks very promising and I'm eager to see if it works, however I'm not sure how to implement your answer into my current search as I'm new to splunk and not that bright, could you please advise how I can implement your answer if my search text looks like this:

source="maint_log.csv" host="Maint log data" sourcetype="csv" | eval Cancelled=if(match(Paragraph, "cancelled|Cancelled"), 1, 0) | table Cancelled "maintenance ID" Paragraph

to4kawa · ‎04-26-2020

  source="maint_log.csv" host="Maint log data" sourcetype="csv" 
| rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"
| eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2)
| eval Cancelled=if(match(lastMsg, "(?i)cancelled"), 1, 0) 
| table Cancelled "maintenance ID" Paragraph

anelson1 · ‎04-26-2020

Thank you, still not quite there, here's a screenshot:

In that screenshot, the 2nd and 1st events are not flagged as "Cancelled" but should be.

to4kawa · ‎04-26-2020

change | eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2) args appropriately.

reference: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval

anelson1 · ‎04-27-2020

Unfortunately, there is no limit to the amount of timestamps in the paragraph entries and different paragraphs can have different amount of timestamps. So I can't always predict how many are going to be present and set the args of mvindex accordingly, is there no way to achieve this?

to4kawa · ‎04-27-2020

| eval lastTimestamp=mvindex(timestamp,-1), lastMsg=mvindex(msg,-1)
How is this?

anelson1 · ‎04-29-2020

getting closer! That worked against the data in the screenshots, however it seems the formatting of the real data is causing issues with the rex command.

After each timestamp in the paragraph, there is the name of the person that input the entry and their username in brackets ().
EG:

 25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry 
 25.12.2019 09:50:52 UTC Andrew Nelson (anelson1) Should this be cancelled? 

 No additional information found 
 26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today, this should be denied

It seems like the parentheses () is causing the regex used in the rex command to not return the full last message in the paragraph but rather just the name and username of the last message.
So, using

 | rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"

msg will be

Andrew nelson (anelson1)
Andrew nelson (anelson1)
Andrew nelson (anelson1)

for the example paragraph above in this comment.

Looks like I'd need to change

(?<msg>.+)

to keep looking for text after the user name in parentheses, but I don't know how...any ideas?

to4kawa · ‎04-29-2020

| rex max_match=0 field=Paragraph "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+?)(?=\d{2}|$)"

anelson1 · ‎04-29-2020

No luck, same result.

So using:

| rex max_match=0 field=Paragraph "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+?)(?=\d{2}|$)"
 | eval lastTimestamp=mvindex(timestamp,-1), lastMsg=mvindex(msg,-1)

lastMsg is still just the name and username of the last entry without the text after the )
lastTimestamp is the timestamp of the last message
so it is retrieving the last entry to the paragraph correctly it seems, but only returning the name and username instead of the whole text.
The formatting of the paragraph entries is always:

dd.mm.yyyy hh:mm:ss UTC  <any number of characters> (<any number of characters>) <any number of characters>

I need the last section after the first ')' to search for keywords in

to4kawa · ‎04-29-2020

I see, your sample is wrong. there is extra \n .
your pic's Paragraph is single text. but, your sample is with \n

I can't make query wrong sample and wrong information.

good luck.

anelson1 · ‎04-29-2020

Yes, the pic looks like single text because that's how Splunk displays it in the search results, I Don't know why the search results in Splunk don't show the new lines.

The formatting of the paragraph entries is always:

 dd.mm.yyyy hh:mm:ss UTC  <letters A-Z> (<letters A-Z and Numbers 0-9>) <any characters with no limit>

\n can be one of those characters following the first closing parentheses )

anelson1 · ‎04-29-2020

Additionally the text after the username in parentheses () can be anything, it can include any number of characters and spaces

anelson1 · ‎04-20-2020

Still looking for help on this if anyone has any ideas, please.

cpetterborg · ‎04-19-2020

More information needed.

Can the term appear anywhere in the last entry of the paragraph?

What constitutes a paragraph (is it the entire event), and what constitutes an entry (I assume it is a part of the event, but what delineates one entry from another)?

anelson1 · ‎04-19-2020

"Can the term appear anywhere in the last entry of the paragraph?"
Yes

"What constitutes a paragraph (is it the entire event), and what constitutes an entry (I assume it is a part of the event, but what delineates one entry from another)?"
I'm new to Splunk so I'm not sure what an "event" is....
Each entry is delineated by another field called maintenance ID.
The data comes from a csv file which when opened with excel has two columns "maintenance ID" and "Paragraph".

cpetterborg · ‎04-19-2020

An event is any data item that comes back from a splunk search. In most cases that would be equivalent to a log entry, whether it is a single line or multiple lines that are part of a single log message.

What does the "entry" (or event) look like when you search the data? Can you provide a "sanitized" event that can show what you full event looks like and if there are fields that are extracted from that event?

anelson1 · ‎04-19-2020

Ok thank you.

The data in splunk after this search:
source="maint_log.csv" host="Maint log data" sourcetype="csv" | eval Cancelled=if(match(Paragraph, "cancelled|Cancelled"), 1, 0) | table Cancelled "maintenance ID" Paragraph

returns 3 columns

First one is Cancelled, with just 1's and 0's,
Second one is maintenance ID, the values being a single number (8 chars long)
Third one is the Paragraph, a long block of text usually with multiple date/timestamps in it

so there are in this case 133 rows returned from that search as that's how many lines the csv contains too.

How do you search part of a text field (delimited by date)?

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Get Early Access to AI Playbook Authoring: Apply for the Alpha Private Preview ...

Reduce and Transform Your Firewall Data with Splunk Data Management

Are you a member of the Splunk Community?

How do you search part of a text field (delimited by date)?

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Get Early Access to AI Playbook Authoring: Apply for the Alpha Private Preview ...

Reduce and Transform Your Firewall Data with Splunk Data Management