Splunk Search

How do you search part of a text field (delimited by date)?

anelson1
New Member

I'm trying to search for specific words inside the last entry added to a paragraph, where each entry/addition to the paragraph is time & date stamped.

For example:
Paragraph = "25.12.2019 07:24:06 UTC Initial text entry 25.12.2019 09:50:52 UTC Should this be cancelled? No additional information found 26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"

I want to catalogue paragraphs that have the term 'cancelled' in them but only if the term is in the last entry in the paragraph. As you can see, the word 'cancelled' is in the middle of the paragraph following the entry on 25.12.2019 09:50:52 UTC and also in the last entry, so this would be catalogued as a "cancelled" paragraph in what I'm trying to do. There are several paragraphs I Have to search in this way and I plan to search for other terms aside from 'cancelled' once I figure out how to search on only the last entry in the paragraph rather than the whole paragraph.

0 Karma

to4kawa
Ultra Champion
| makeresults
| eval Paragraph="25.12.2019 07:24:06 UTC Initial text entry
25.12.2019 09:50:52 UTC Should this be cancelled?     
No additional information found 
26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"
| appendpipe [ eval Paragraph= "25.12.2019 07:24:06 UTC Initial text entry 
25.12.2019 09:50:52 UTC Should this be cancelled? 

No additional information found 
26.12.2019 05:55:51 UTC No issues from this machine today, this should be denied"]
| rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"
| eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2)
| where match(lastMsg,"cancelled")

use rex and where

0 Karma

anelson1
New Member

Sadly, this doesn't do what I need. This will find the word canceled ANYWHERE in the paragraph, I need to only find paragraphs where the word I'm looking for is in the last timestamped entry of the paragraph.

For example:
Paragraph example 1

    "25.12.2019 07:24:06 UTC Initial text entry 
    25.12.2019 09:50:52 UTC Should this be cancelled?     
    No additional information found 
    26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"

Paragraph example 2

"25.12.2019 07:24:06 UTC Initial text entry 
25.12.2019 09:50:52 UTC Should this be cancelled? 

No additional information found 
26.12.2019 05:55:51 UTC No issues from this machine today, this should be denied"

Example 2 has the word cancelled in it but not in the last timestamped entry, so this paragraph should NOT be counted.
Example 1 has the word cancelled in the last timestamped entry, so this paragraph SHOULD be counted.

0 Karma

to4kawa
Ultra Champion

I see, my answer is updated.

0 Karma

anelson1
New Member

Thank you, that looks very promising and I'm eager to see if it works, however I'm not sure how to implement your answer into my current search as I'm new to splunk and not that bright, could you please advise how I can implement your answer if my search text looks like this:

source="maint_log.csv" host="Maint log data" sourcetype="csv" | eval Cancelled=if(match(Paragraph, "cancelled|Cancelled"), 1, 0) | table Cancelled "maintenance ID" Paragraph
0 Karma

to4kawa
Ultra Champion
  source="maint_log.csv" host="Maint log data" sourcetype="csv" 
| rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"
| eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2)
| eval Cancelled=if(match(lastMsg, "(?i)cancelled"), 1, 0) 
| table Cancelled "maintenance ID" Paragraph
0 Karma

anelson1
New Member

Thank you, still not quite there, here's a screenshot:
alt text

In that screenshot, the 2nd and 1st events are not flagged as "Cancelled" but should be.

0 Karma

to4kawa
Ultra Champion

change | eval lastTimestamp=mvindex(timestamp,2), lastMsg=mvindex(msg,2) args appropriately.

reference: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval

0 Karma

anelson1
New Member

Unfortunately, there is no limit to the amount of timestamps in the paragraph entries and different paragraphs can have different amount of timestamps. So I can't always predict how many are going to be present and set the args of mvindex accordingly, is there no way to achieve this?

0 Karma

to4kawa
Ultra Champion

| eval lastTimestamp=mvindex(timestamp,-1), lastMsg=mvindex(msg,-1)
How is this?

0 Karma

anelson1
New Member

getting closer! That worked against the data in the screenshots, however it seems the formatting of the real data is causing issues with the rex command.

After each timestamp in the paragraph, there is the name of the person that input the entry and their username in brackets ().
EG:

 25.12.2019 07:24:06 UTC Andrew Nelson (anelson1) Initial text entry 
 25.12.2019 09:50:52 UTC Andrew Nelson (anelson1) Should this be cancelled? 

 No additional information found 
 26.12.2019 05:55:51 UTC Andrew Nelson (anelson1) No issues from this machine today, this should be denied

It seems like the parentheses () is causing the regex used in the rex command to not return the full last message in the paragraph but rather just the name and username of the last message.
So, using

 | rex field=Paragraph max_match=0 "(?<timestamp>\S+\s\S+\sUTC)\s(?<msg>.+)"

msg will be

Andrew nelson (anelson1)
Andrew nelson (anelson1)
Andrew nelson (anelson1)

for the example paragraph above in this comment.

Looks like I'd need to change

(?<msg>.+)

to keep looking for text after the user name in parentheses, but I don't know how...any ideas?

0 Karma

to4kawa
Ultra Champion

| rex max_match=0 field=Paragraph "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+?)(?=\d{2}|$)"

0 Karma

anelson1
New Member

No luck, same result.

So using:

| rex max_match=0 field=Paragraph "(?ms)(?<timestamp>\d{2}\.\d{2}\.\d{4}\s\S+\sUTC)\s(?<msg>.+?)(?=\d{2}|$)"
 | eval lastTimestamp=mvindex(timestamp,-1), lastMsg=mvindex(msg,-1)

lastMsg is still just the name and username of the last entry without the text after the )
lastTimestamp is the timestamp of the last message
so it is retrieving the last entry to the paragraph correctly it seems, but only returning the name and username instead of the whole text.
The formatting of the paragraph entries is always:

dd.mm.yyyy hh:mm:ss UTC  <any number of characters> (<any number of characters>) <any number of characters>

I need the last section after the first ')' to search for keywords in

0 Karma

to4kawa
Ultra Champion

I see, your sample is wrong. there is extra \n .
your pic's Paragraph is single text. but, your sample is with \n

I can't make query wrong sample and wrong information.

good luck.

0 Karma

anelson1
New Member

Yes, the pic looks like single text because that's how Splunk displays it in the search results, I Don't know why the search results in Splunk don't show the new lines.

The formatting of the paragraph entries is always:

 dd.mm.yyyy hh:mm:ss UTC  <letters A-Z> (<letters A-Z and Numbers 0-9>) <any characters with no limit>

\n can be one of those characters following the first closing parentheses )

0 Karma

anelson1
New Member

Additionally the text after the username in parentheses () can be anything, it can include any number of characters and spaces

0 Karma

anelson1
New Member

Still looking for help on this if anyone has any ideas, please.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

More information needed.

Can the term appear anywhere in the last entry of the paragraph?

What constitutes a paragraph (is it the entire event), and what constitutes an entry (I assume it is a part of the event, but what delineates one entry from another)?

0 Karma

anelson1
New Member

"Can the term appear anywhere in the last entry of the paragraph?"
Yes

"What constitutes a paragraph (is it the entire event), and what constitutes an entry (I assume it is a part of the event, but what delineates one entry from another)?"
I'm new to Splunk so I'm not sure what an "event" is....
Each entry is delineated by another field called maintenance ID.
The data comes from a csv file which when opened with excel has two columns "maintenance ID" and "Paragraph".

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

An event is any data item that comes back from a splunk search. In most cases that would be equivalent to a log entry, whether it is a single line or multiple lines that are part of a single log message.

What does the "entry" (or event) look like when you search the data? Can you provide a "sanitized" event that can show what you full event looks like and if there are fields that are extracted from that event?

0 Karma

anelson1
New Member

Ok thank you.

The data in splunk after this search:
source="maint_log.csv" host="Maint log data" sourcetype="csv" | eval Cancelled=if(match(Paragraph, "cancelled|Cancelled"), 1, 0) | table Cancelled "maintenance ID" Paragraph

returns 3 columns

First one is Cancelled, with just 1's and 0's,
Second one is maintenance ID, the values being a single number (8 chars long)
Third one is the Paragraph, a long block of text usually with multiple date/timestamps in it

so there are in this case 133 rows returned from that search as that's how many lines the csv contains too.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...