Solved: What would be the strategy to extract relevant dat...

nqjpm · ‎04-20-2018

Description field parsing data from has some unnecessary survey data that I would like to ignore and NOT count. That data is denoted by ##Survey##. All data after this denotation can be ignored but I have not been able to determine a good way to do this. Can this be done without regex?

Example of the what the user input field looks like (since its text input the length and words change):

description="My outlook wont opena as a i keep getting an error message. ##Survey## Which of the following best describes your needs?: My Outlook is slow or not launching - Please clarify your issue further:: Outlook is slow or unable to launch

OR

description="I tried to login via * a few days ago and I kept getting the message that my password and/or token information was incorrect. I reset my token PIN and it still wouldn't work (PIN first, token key second). Does this happen often? I need to login this weekend and would like to have the issue resolved as soon as possible. Thank you! ##Survey## Please choose the option which best describes your problem.: ASSISTANCE WITH * TOKEN - Do you need assistance with your * token?: yes - Which best describes your request?: Other

My search counts words in the description field to see what issues may be trending. However, the words in the survey are skewing my data.

 index=foo
    | fields description
    | makemv delim=" " description
    | mvexpand description
    | eval LowerCase=lower(description)
    | eval length=len(LowerCase) |search length > 2
    |top limit=20 LowerCase

somesoni2 · ‎04-20-2018

Try like this (line 3 would keep the description value before the ##Survey##)

index=foo
| fields description
| eval description=mvindex(split(description,"##Survey##"),0)
| makemv delim=" " description
| mvexpand description
| eval LowerCase=lower(description)
| eval length=len(LowerCase) |search length > 2
|top limit=20 LowerCase

View solution in original post

elliotproebstel · ‎04-20-2018

This will remove "##Survey##" and everything following it from the field description:

|rex mode=sed field=description "s/##Survey##.*//"

So I'd arrange your search commands like this:

index=foo
| fields description
| rex mode=sed field=description "s/##Survey##.*//"
| makemv delim=" " description
| mvexpand description
| eval LowerCase=lower(description)
| eval length=len(LowerCase) 
| search length > 2
| top limit=20 LowerCase

nqjpm · ‎04-20-2018

This also works. Two great working answers in less than an hour. I love this community!

somesoni2 · ‎04-20-2018

Try like this (line 3 would keep the description value before the ##Survey##)

index=foo
| fields description
| eval description=mvindex(split(description,"##Survey##"),0)
| makemv delim=" " description
| mvexpand description
| eval LowerCase=lower(description)
| eval length=len(LowerCase) |search length > 2
|top limit=20 LowerCase

nqjpm · ‎04-20-2018

That works great! Thanks!

What would be the strategy to extract relevant data from field with unnecessary data?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk