Hi,
I have the following example record:
30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; "msisdn":"xxxxxxxxx","Type":"\u0006","APN":"aaa","imsi":"xxxxxxxx","imei":"xxxxxxxxx","SGSN":null,"Remote IP Address":"xx.xx.xx.xx","TotalTimeInMS":0}
I can not search by Type, because it is a unicode value, and Splunk does not parse it correctly.
The are 2 possible Type values: 1. "\u0006" 2. "\u0003".
I am using the following splunk search:
mysearch | spath input=anyparams | search Type="\u0006" 
The problem is that i receive no result,
How should I use the search, when the field contains a unicode value?
Thanks in advance,
Yossi
 
					
				
		
@yyossef, if you are searching unicode stored as text you would need to escape backslash by prefixing another backslash i.e. "\\u0006" or "\\u0003" in your SPL.
Following is and example to use the same in search filter or eval function
 <yourCurrentSearch>
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"
Following is run anywhere search based on sample data provided:
| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| extract pairdelim="," kvdelim=":"
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"
Just curious: why are unicode values not being cleansed/translated before the information gets sent to Stripe? As far as I know, data like this very rarely makes its way into Splunk, and much of what passes as weird UTF-8 codes do not make it into Splunk at all.
Ian Quick shared this example code with us that shows how to test for uTF-8 characters and strip them out: https://github.com/Shopify/shopify-tracing/commit/816ba2aef3c6ee8a232766028181b7b1ca03a2b1
I'd highly recommend cleansing your data before emitting to Stripe. Once the data is in Splunk, 99.9% of the UTF code will be lost and Splunk will not help you debug that issue. Cleansing your output before it hits Stripe is probably the best course of action.
Looking at Unicode Character 'ACKNOWLEDGE' (U+0006)
It  tells us that \u0006 is not a unicode/utf-8 character representation - it's the way several programming languages chose to represent it.
 
					
				
		
@yyossef, if you are searching unicode stored as text you would need to escape backslash by prefixing another backslash i.e. "\\u0006" or "\\u0003" in your SPL.
Following is and example to use the same in search filter or eval function
 <yourCurrentSearch>
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"
Following is run anywhere search based on sample data provided:
| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| extract pairdelim="," kvdelim=":"
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"
Hi @niketnilay,
Yours other suggestion using searchmatch worked.
| makeresults
 | eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
 | eval TypeDescription=case(searchmatch("\u0006"),"ACKNOWLEDGE",searchmatch("\u0004"),"END OF TEXT",true(),"Others")
 | search TypeDescription="ACKNOWLEDGE"
Why would searchmatch works while Type=="\u0006" did not?
 
					
				
		
@yyossef Type field is not getting auomatically extracted as part of Search Time field discovery. The searchmatch command finds the pattern match in the entire raw data. You would need to create your own Field Extraction to create a Type field based on Regular Expression.
I am glad your issue is resolved. Do let us know if you need further help. Do up vote the answer/comments that helped! 🙂
Hi @niketnilay,
Thanks for your prompt response.
Still no luck, the search result is empty.
When using your 4 example, the result came back with only the deafault value "Other", meaning, no match was found.
I am not sure that the unicode is stored as text, i think it is display as text by the system, but stored as unicode value.
Do you have idea how to verify that? or how to search by unicode value?
 
					
				
		
@yyossef, I am not sure whether the Type field is actually being extracted or not... So first let us try a different approach. Following example does not try to extract Type field. Instead searched for unicode characters in raw data.
| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| eval TypeDescription=case(searchmatch("\\u0006"),"ACKNOWLEDGE",searchmatch("\\u0004"),"END OF TEXT",true(),"Others")
| search TypeDescription="ACKNOWLEDGE"
