All Apps and Add-ons

NLP Text Analytics: Can clean text using sentences instead of stop words be achieved with a Lookup?

edoardo_vicendo
Contributor

Hi All,
I am trying to clean the support ticket description to help the cluster command on the clusterization.
I have found the "Splunk NLP Text Analytics app" that permits to use the cleantext function, but it allows to define single words to clean the text, basically because the analyzed text is tokenized.

Do you know if there is a more efficient way than the one indicated below to perform a cleanup on a text cleaning the defined sentences instead of single words?

index ="myindex" host="myhost" sourcetype="mysourcetype" source="mysource"
| eval description=lower(description)
| eval description= replace(description,"the user reports", "")
| eval description= replace(description,"the user complains that", "")
| ....

I would like to create a lookup table that join over the sentences and clean them from the description field.
Do you have any suggestion or better approach?

Thanks a lot,
Edoardo

0 Karma
1 Solution

skalliger
SplunkTrust
SplunkTrust

Hi,

I can think of several solutions but only one as an extensible example.

Basically, build a Data Model that does this for using calculated fields. Because otherwise, when you do it while running your search, searches may get slow depending on the overall load and amount of data.

This way, you offload this logic to the data model (and accelerate to be even faster) and build your search on top of the calculated fields of the data model.

Skalli

View solution in original post

0 Karma

worshamn
Contributor

The cleantext command does offer an option to remove custom stopwords using custom_stopwords=comma-seperated-list, so in your example you could add custom_stopwords="user,report,complain" as the other words are standard stopwords. Or you could remove the particular phrases before running the command. Not sure if that is what you are after.

0 Karma

edoardo_vicendo
Contributor

Thanks for your feedback, yes I was using this option, but you can't join over a lookup in this way. You can just define a list of words (not sentences) to be excluded

0 Karma

skalliger
SplunkTrust
SplunkTrust

Hi,

I can think of several solutions but only one as an extensible example.

Basically, build a Data Model that does this for using calculated fields. Because otherwise, when you do it while running your search, searches may get slow depending on the overall load and amount of data.

This way, you offload this logic to the data model (and accelerate to be even faster) and build your search on top of the calculated fields of the data model.

Skalli

0 Karma

edoardo_vicendo
Contributor

I believe this is a reliable solution. Thanks a lot!

0 Karma

skalliger
SplunkTrust
SplunkTrust

Glad I could be of help. 🙂

0 Karma
Get Updates on the Splunk Community!

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...

Getting Started with AIOps: Event Correlation Basics and Alert Storm Detection in ...

Getting Started with AIOps:Event Correlation Basics and Alert Storm Detection in Splunk IT Service ...

Register to Attend BSides SPL 2022 - It's all Happening October 18!

Join like-minded individuals for technical sessions on everything Splunk!  This is a community-led and run ...