Hello guys,
I'm new in SPLUNK. Just wanted to ask for an advice :). Currently, I have 11,000 ticket data and I'm trying to filter the most common events/issues/words on it. I am trying the use of cluster, regex and lookup.
What do you think is the best approach for this?
Thank you in advance everyone. 🙂
Hi Skalli. You might want to have a play around with the these two apps.
NLP Text Analytics - https://splunkbase.splunk.com/app/4066/ - A collection of bits a pieces to do text analysis based around NLTK3.3 and Splunk's MLTK.
NLP Natural Language Toolkit - NLTK wrapper - https://splunkbase.splunk.com/app/4057/ - Another wrapper for some of the same python libraries for Natural Language Processing.
Should be able to get the job done, not sure how well at large scale but 11k records is not much.
I wasn't the one asking but this is actually a great answer. I've linked the NLP once myself but even didn't think about it in this case. 🙂
Skalli
There really is no good way to do this that will scale to any degree. You should consider another Big Data tool that would be more appropriate.
Hey and welcome to the Splunk community. 🙂
First of all, the answers to your questions have a "depends" in it. If your data is in an easy structure to onboard, you might want to start reading and working through the docs: getting data in. After the data is onboarded correctly, the next thing would be to build field extractions based on the events. For this, you can use the field extractor. After you have built your fields, you can easily filter on those with something more simple like index=yourIndex sourcetype=yourSourcetype |top your_desired_field1, field2 ...
.
Skalli
Hi Skalli! thank you for your answer. It was not a simple unfortunately.. 😞 I'll give a sample data below:
Sample Data
Can you please reset my password?
Password Reset request
Unable to open my account
Please help! Can't access my account.
Can't connect to Wifi
Reset my Password
... and so on.
I wanted to automatically filter the 11,000 data on what is the most frequent words. thanks 🙂