All Apps and Add-ons

How to use differente languages in NLP Text Analytics app

celianouguier
Explorer

Hi,

I am using the app NLP Text Analytics with texts in english.

I have two questions about this app :

  1. Is there a way to use the command " | vader [...] " with some other languages (french for example)
  2. Does the command " | TruncatedSVD [...] " take into consideration the language of the texts ?
0 Karma
1 Solution

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

View solution in original post

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

celianouguier
Explorer

Thank you so much for your answer @worshamn !
Are you aware of a date when we will have a multilingual version in vader?
I don't know how to do text analysis in French in Splunk or if there is an effective and easy workaround to be at my level of competence....
Thank you for answering my questions anyway!

0 Karma
Get Updates on the Splunk Community!

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...