Getting Data In

Why does my indexed data appear as a series of x and o characters?



I am running a PowerShell script to download HTML code from two pages:


$wc.downloadstring("") >C:\Output\Output.txt
$wc.downloadstring("") >C:\Output\Output_Page1.txt

I then configured Splunk to monitor c:\output*

output.txt injests just fine, but when OutputPage1.txt injests, 2 things happen:
1) all you see is x's & 0's (you can click event actions -- show source)
2) the sourcetype appends -too

HTML pages aren't very different. Not sure why these 2 downloaded HTML sources are behaving differently.


Thanks in advance!

0 Karma

Splunk Employee
Splunk Employee


Have you checked that your script produce UTF-8 ?
If not, you probably need to specify the charset associated to the sourcetype used in your monitor stanza so that splunk can convert the text to UTF-8
(so in inputs.conf , you monitor your file and used sourcetype1 (as a example) , in props.conf, you specify CHARSET for this sourcetype1 used.)

0 Karma


The "-too_small" suffix is added by Splunk when it doesn't have enough data to guess about the correct sourcetype. The fix for that is to provide a sourcetype in inputs.conf so Splunk doesn't have to guess. This a Splunk Best Practice.
If you provide some sample data we may be able to help with the necessary props.conf settings for the sourcetype.

If this reply helps you, an upvote would be appreciated.
0 Karma