More and more I'm getting reports of bad queries, or queries that don't match results from a separate run. In most cases, if I re-type the search string manually, it works. Visually the string is identical, but apparently there are invisible characters introduced at some point in the crafting of the string.
When the search fails completely, it's more apparent and I can train users on this. Sometimes, however, it's a silent failure. For example, today there was an issue with one of the filters in the string: FIELD!=value
. Checking the search inspector revealed that Splunk was seeing FIELD!=value\xa0
. Again, visually the string looked fine.
Anyone got suggestions for how to address this with the 500+ users on my system? I fear bad results are being handed out because parts of searches silently aren't firing.
Give this a run on your search head. It should give you a list of any scheduled search that has questionable characters being used such as in your case where you are seeing these hidden characters. Its not perfect as some of them might be a characters that was intended based on the events you are searching but should get you down to a manageable list of searches that need further investigation. You could probably change this rest command around as well to search for any adhoc queries.
I set it up so that it will show you the problematic character, but also the surrounding characters in the case that the character is one of these "space" characters. Worked great for me and actually found some issues in a few of my searches. We have issues when people copy " out of outlook where its not the " character that splunk wants.
| rest /servicesNS/-/-/saved/searches/ | rename eai:acl.owner AS CreatedBy | rename title AS Name | rename search AS SearchQuery | rename eai:acl.app AS SplunkAppLocation | fields Name CreatedBy SearchQuery | rex field=SearchQuery "(?<NonStandardCharacter>[^ -~\r\n])" | search NonStandardCharacter=* | rex field=SearchQuery "(?<AroundTheNonStandardCharacter>....[^ -~\r\n]....)" | search NOT Name="DMC Asset - Build Full"
"DMC Asset - Build Full" is some internal search that it picks up for some reason. Didn't spend much time to figure out why
Did you ever figure out what was causing this problem or find a way to audit/discover bad searches affected by it?
Where are the queries coming from? Can you control the source and the applications that your users are using during the copy and pasting? For example, does the text encoding look like what you want in Notepad, Notepad ++, or TextPad? Can you create a process to specify what to use to copy, edit, then paste queries that does not create encoding issues for you?