Solved: Re: What would be the strategy to extract relevant...

nqjpm · ‎04-20-2018

Description field parsing data from has some unnecessary survey data that I would like to ignore and NOT count. That data is denoted by ##Survey##. All data after this denotation can be ignored but I have not been able to determine a good way to do this. Can this be done without regex?

Example of the what the user input field looks like (since its text input the length and words change):

description="My outlook wont opena as a i keep getting an error message. ##Survey## Which of the following best describes your needs?: My Outlook is slow or not launching - Please clarify your issue further:: Outlook is slow or unable to launch

OR

description="I tried to login via * a few days ago and I kept getting the message that my password and/or token information was incorrect. I reset my token PIN and it still wouldn't work (PIN first, token key second). Does this happen often? I need to login this weekend and would like to have the issue resolved as soon as possible. Thank you! ##Survey## Please choose the option which best describes your problem.: ASSISTANCE WITH * TOKEN - Do you need assistance with your * token?: yes - Which best describes your request?: Other

My search counts words in the description field to see what issues may be trending. However, the words in the survey are skewing my data.

 index=foo
    | fields description
    | makemv delim=" " description
    | mvexpand description
    | eval LowerCase=lower(description)
    | eval length=len(LowerCase) |search length > 2
    |top limit=20 LowerCase

somesoni2 · ‎04-20-2018

Try like this (line 3 would keep the description value before the ##Survey##)

index=foo
| fields description
| eval description=mvindex(split(description,"##Survey##"),0)
| makemv delim=" " description
| mvexpand description
| eval LowerCase=lower(description)
| eval length=len(LowerCase) |search length > 2
|top limit=20 LowerCase

View solution in original post

elliotproebstel · ‎04-20-2018

This will remove "##Survey##" and everything following it from the field description:

|rex mode=sed field=description "s/##Survey##.*//"

So I'd arrange your search commands like this:

index=foo
| fields description
| rex mode=sed field=description "s/##Survey##.*//"
| makemv delim=" " description
| mvexpand description
| eval LowerCase=lower(description)
| eval length=len(LowerCase) 
| search length > 2
| top limit=20 LowerCase

nqjpm · ‎04-20-2018

This also works. Two great working answers in less than an hour. I love this community!

somesoni2 · ‎04-20-2018

Try like this (line 3 would keep the description value before the ##Survey##)

index=foo
| fields description
| eval description=mvindex(split(description,"##Survey##"),0)
| makemv delim=" " description
| mvexpand description
| eval LowerCase=lower(description)
| eval length=len(LowerCase) |search length > 2
|top limit=20 LowerCase

nqjpm · ‎04-20-2018

That works great! Thanks!

What would be the strategy to extract relevant data from field with unnecessary data?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

What would be the strategy to extract relevant data from field with unnecessary data?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...