All Apps and Add-ons

Why does NLP Text Analytics on Splunk 8.0.5 (Python 3.7.4) freeze searches when encountering accented characters?

andrew_f_trobec
Explorer

Hello,

I've noticed post upgrade to Splunk Enterprise 8.0.5 that NLP Text Analytics searches freeze when encountering accented characters as well as some additional characters such as:

  • à
  • – (long dash)

I am certain there are more, but I just want to know how to make them compatible with the NLP Text Analytics searches.  I did not have this problem with Splunk 7.3.2 running Python 2.7.x.

I am using lookups to put my data in, but the same happens when the data is coming from an index.  I tried creating and recreating the lookup with various methods to ensure that it's UTF-8 encoding but I could not resolve.  If I put one of the characters mentioned above into the pride_prejudice sample CSV files and it breaks that as well (to try it, use field "sentence" and search "| inputlookup pride_prejudice.csv | head 1" on the Counts dashboard).

I have the following components installed:

  • nlp-text-analytics - 1.1.0
  • Splunk_SA_Scientific_Python_linux_x86_64 - 2.0.2
  • Splunk_ML_Toolkit - 5.2.0

Does anybody know how to solve?

Thanks!

Andrew

Labels (1)
Tags (1)
0 Karma
1 Solution

andrew_f_trobec
Explorer

I figured it out, and the issue was with the cleantext custom command packaged with NLP Text Analysis.  This command seems only to work with python2.  I suspected this and updated the cleantext stanza local/commands.conf with python.version = python2.  After restarting nothing changed.

After further investigation it seems Splunk 8.0.x comes packaged with both python 3 and python 2, with python.version in default/server.conf set to python2.  In my case I had a value force_python3 value set in local/server.conf, which means that setting python.version anywhere else (like in local/commands.conf for the cleantext command) will be ignored.  I updated that value to python3, restarted, and everything started working.

So I think NLP Text Analytics assumes that users leave the python.version value in default/server.conf as python2.  In my case that value was updated in local/server.conf which screwed everything up.  This might be written in the documentation somewhere, but I'm not going to lie: I didn't even check it...

I hope this may clarify some things for others!

View solution in original post

0 Karma

andrew_f_trobec
Explorer

I figured it out, and the issue was with the cleantext custom command packaged with NLP Text Analysis.  This command seems only to work with python2.  I suspected this and updated the cleantext stanza local/commands.conf with python.version = python2.  After restarting nothing changed.

After further investigation it seems Splunk 8.0.x comes packaged with both python 3 and python 2, with python.version in default/server.conf set to python2.  In my case I had a value force_python3 value set in local/server.conf, which means that setting python.version anywhere else (like in local/commands.conf for the cleantext command) will be ignored.  I updated that value to python3, restarted, and everything started working.

So I think NLP Text Analytics assumes that users leave the python.version value in default/server.conf as python2.  In my case that value was updated in local/server.conf which screwed everything up.  This might be written in the documentation somewhere, but I'm not going to lie: I didn't even check it...

I hope this may clarify some things for others!

0 Karma
Get Updates on the Splunk Community!

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...

Introducing New Splunkbase Governance!

Splunk apps are essential for maximizing the value of your Splunk Experience. Whether you’re using the default ...