Splunk IT Service Intelligence

Text Clustering in Splunk

lavanya_gurrapu
New Member

Hi,

Here is my requirement
I have file with column 'Description'. I need to get the most common pattern of the words.Example

Repetitive Pattern Count Percentage Examples
Job 80 15% Job Related with Ticket number
Access 130 20% Access issues

Any Job or Jobs should categorize as Job.
I have installed Machine Learning Tool Kit and tried to apply TFIDF and Kmeans. I am unable to proceed as i am new to splunk.
Can any one help me how to do clustering using Kmeans with data as mentioned above and get required output.

Please help.

Tags (1)
0 Karma

woodcock
Esteemed Legend

You can use the kmeans command for this:
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Kmeans
Or you can have even more control in the Machine Learning ToolKit (MLTK) to build a model. Once that is done, you can inspect the KMeans model you built with fit using the summary command:

| summary <your_model_name>

When assigning new points to the appropriate cluster, you can simply apply your model like this:

<new_points> | apply <your_model_name>
0 Karma

lavanya_gurrapu
New Member

Hi,

I have tried below search command to exclude stop words
index=sample| makemv Summary | mvexpand Summary|fields Summary| search Summary NOT [|inputlookup words.csv|rename word as summary1]|top summary1

No results are fetched. Please help where i am doing mistake

0 Karma

woodcock
Esteemed Legend

If you don't have a list of keywords, you can try the cluster command:
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Cluster

But it sounds like you have a limited set so you can do something like this:

Your Search Here
| eval cluster_keyword = case(
   match(_raw, "(?i)job"), "job",
   match(_raw, "(?i)access"), "access",
   match(_raw, "(?i)ticket"), "ticket",
   true() "other")
| stats first(_raw) last(_raw) count BY cluster_keyword
| eventstats sum(count)AS total
| eval pct = 100 * count / total
0 Karma

lavanya_gurrapu
New Member

Hi,

Thank you for the quick reply.

Firstly, want to remove the stop words and categorize the similar words into one category. Next should be, most recurrent words should display with count.

How can i implement this logic in Splunk. I need to use Kmeans algorithm

0 Karma

jaime_ramirez
Communicator

For string matching you could check this post:
https://www.splunk.com/en_us/blog/tips-and-tricks/you-can-t-hyde-from-dr-levenshtein-when-you-use-ur...

Intelligent text pattern matching might be a little hard to implement. I will investigate further.

Hope it helps!!!

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...