I'm trying to search for specific words inside the last entry added to a paragraph, where each entry/addition to the paragraph is time & date stamped.
For example:
Paragraph = "25.12.2019 07:24:06 UTC Initial text entry 25.12.2019 09:50:52 UTC Should this be cancelled? No additional information found 26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"
I want to catalogue paragraphs that have the term 'cancelled' in them but only if the term is in the last entry in the paragraph. As you can see, the word 'cancelled' is in the middle of the paragraph following the entry on 25.12.2019 09:50:52 UTC and also in the last entry, so this would be catalogued as a "cancelled" paragraph in what I'm trying to do. There are several paragraphs I Have to search in this way and I plan to search for other terms aside from 'cancelled' once I figure out how to search on only the last entry in the paragraph rather than the whole paragraph.
Are you able to help any further cpetterborg?
Your reply still lacks sufficient information. Please use the text formatting button that looks like 101010
in two rows next to the double quotes "
button to provide preformatted text to clarify answers and example data.
So, please provide some (sanitized) example data that shows the original data AND the various parts of the data (as you state "paragraphs") so that it can be determined exactly what you are trying to explain. It is still unclear what constitutes a "paragraph" and you talk about a first and second paragraph. So what part is the first paragraph? What part is the second paragraph? Are there always two paragraphs, or can there be just one? Are there ever three or more paragraphs?
Without an understanding of your data it is impossible to help you out. As you can see by the lack of responses, no one is able to help because there isn't enough information to help you. This is likely not a hard problem to overcome, it's just a lack of information to come up up with an answer.
Sorry to be blunt. I would truly like to help.
Thank you for replying again, appreciate it.
So, please provide some (sanitized) example data that shows the original data AND the various parts of the data (as you state "paragraphs") so that it can be determined exactly what you are trying to explain.
Ok, let me try to clarify it further then
ORIGINAL DATA/Screenshot of the csv opened in excel
It is still unclear what constitutes a "paragraph" and you talk about a first and second paragraph. So what part is the first paragraph? What part is the second paragraph? Are there always two paragraphs, or can there be just one? Are there ever three or more paragraphs?
I had a read through my responses here and cannot find where I mentioned first/second/third paragraphs sorry, hoping the screenshot above clarifies everything enough.
Which time do you want by each maintenaceID?
None, not worried about what time the entries are, I just need to be able to search for keywords that exist in the last entry in each paragraph, regardless if they appear in the rest of the paragraph or not.
I see, check my answer
Hi anelson1,
This question needs more background information, details to get a correct answer. For example, when you say but only if the term is in the last entry in the paragraph
do you want to match the time stamp before as well or not? There are many other question to be answered before this answer can be answered correctly - if that make sense.
As an example, if your _raw
contains the message like this:
Paragraph = "25.12.2019 07:24:06 UTC Initial text entry 25.12.2019 09:50:52 UTC Should this be cancelled? No additional information found 26.12.2019 05:55:51 UTC No issues from this machine today, this should be cancelled"
you could use a field extraction like this:
| rex "\s(?<last>\w+)\"" | search last="cancelled" | ...
But if this will yield the correct results as you expect it is hard to tell without more context.
cheers, MuS
PS: question, SPL posted here are from a chat with cp-regex-guru and martin_mueller
"when you say but only if the term is in the last entry in the paragraph do you want to match the time stamp before as well or not?"
I Don't think so, not sure what you mean by match the timestamp before as I don't want to match any time stamps.
Additional background information:
These paragraphs come from a csv file extracted from a maintenance tracking system, each entry contains a maintenance entry ID and a maintenance log (the paragraph). The system allows technicians to add information, via free text, to each maintenance record. The system will then append a timestamp and the free text entered to the existing text/paragraph.
The overall aim is to categorize the log entries into categories to draw a picture of our most common issues.
At first I was doing a simple
"| eval Cancelled=if(match(Paragraph, "cancelled|Cancelled"), 1, 0)" but then found a few entries incorrectly tagged as Cancelled as a technician had used the word 'cancelled' in a previous entry but not in the last entry, then I found that if they use that word in the last entry then 100% of the time it was correctly categorized.
Are you able to help any further MuS?