Getting Data In

Why does my indexed data appear as a series of x and o characters?

Communicator

Hello,

I am running a PowerShell script to download HTML code from two pages:

i.e.:

$wc.downloadstring("https://www.website.com/index.html") >C:\Output\Output.txt
$wc.downloadstring("https://www.website.com/pages/page1.html") >C:\Output\Output_Page1.txt

I then configured Splunk to monitor c:\output*

output.txt injests just fine, but when OutputPage1.txt injests, 2 things happen:
1) all you see is x's & 0's (you can click event actions -- show source)
2) the sourcetype appends -too
small

HTML pages aren't very different. Not sure why these 2 downloaded HTML sources are behaving differently.

Ideas?

Thanks in advance!

0 Karma

Splunk Employee
Splunk Employee

Hi,

Have you checked that your script produce UTF-8 ?
If not, you probably need to specify the charset associated to the sourcetype used in your monitor stanza so that splunk can convert the text to UTF-8
(so in inputs.conf , you monitor your file and used sourcetype1 (as a example) , in props.conf, you specify CHARSET for this sourcetype1 used.)

0 Karma

SplunkTrust
SplunkTrust

The "-too_small" suffix is added by Splunk when it doesn't have enough data to guess about the correct sourcetype. The fix for that is to provide a sourcetype in inputs.conf so Splunk doesn't have to guess. This a Splunk Best Practice.
If you provide some sample data we may be able to help with the necessary props.conf settings for the sourcetype.

---
If this reply helps you, an upvote would be appreciated.
0 Karma