Unable to find double-byte characters in Okta Iden...

shomatsuo · ‎01-19-2021

When Okta Identity Cloud Add-on for Splunk saves log data in okta into Splunk, Japanese letters are converted into Unicode-escaped state and not unescaped.

example:
Letters "田中" in original log are saved in splunk as converted letters, that is, "\u7530\u4e2d".

Therefore, we cannot reach logs we would like to see by searching with Japanese letters.
example:
I expect the statement below to search logs including "田中", but actually nothing are found:

index="okta_logs" 田中

To fix this, I think you need to modify the source code of Okta Identity Cloud add-on.
I ask Okta Identity Cloud Add-on for Splunk to have a function to Unicode-unescape multi-byte letters.

mbegan · ‎02-09-2021

Hello Shomatsuo,

I've created an issue on the github repo for this add-on

https://github.com/mbegan/Okta-Identity-Cloud-for-Splunk/issues/28

shomatsuo · ‎01-19-2021

The field definition values are utf-8 encoded by Splunk, so that's okay.
However, field definitions are not a solution because I want to perform a full-text search.

Even if it is re-indexed in the summary index, it will be encoded in utf-8. However, it wastes storage space and degrades real-time performance, so it is not good solution.

Unable to find double-byte characters in Okta Identity Cloud add-on for Splunk

search

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!

Are you a member of the Splunk Community?

Unable to find double-byte characters in Okta Identity Cloud add-on for Splunk

search

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!