All Apps and Add-ons

How to use differente languages in NLP Text Analytics app

celianouguier
Explorer

Hi,

I am using the app NLP Text Analytics with texts in english.

I have two questions about this app :

  1. Is there a way to use the command " | vader [...] " with some other languages (french for example)
  2. Does the command " | TruncatedSVD [...] " take into consideration the language of the texts ?
0 Karma
1 Solution

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

View solution in original post

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

celianouguier
Explorer

Thank you so much for your answer @worshamn !
Are you aware of a date when we will have a multilingual version in vader?
I don't know how to do text analysis in French in Splunk or if there is an effective and easy workaround to be at my level of competence....
Thank you for answering my questions anyway!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...