You cannot add PDFs, without first passing the events through a binary decoder. If you can find one that works from the CLI, then you can look at how Splunk handles
*.zip files (google
unarchive_cmd) and do it the same way. As far as CSVs or XMLs, you can use
INDEXED_EXTRACTIONS feature (google that).
XML and csv are both straight forward,
To add data, click the green Add Data button (to the right of the list of apps.) > selet upload and then browse the docs to be uploaded.
For PDF's i dont think there is something available from Splunk directly.
One thing we can do is to use tools which can convert PDF to text file and then add monitoring on it.
OR enable NO_BINARY_CHECK to true in props.conf, Which would force PDF files to be indexed even though they are binary.
XML and CVS files are plain text and easily ingested by Splunk. PDF files are not text and cannot be ingested by Splunk without some kind of pre-processing.
We'll need to hear more about what you want to to do with these files to offer specific advise. Are these one-time adds or will the files be monitored for changes?
A scripted input is a program, usually written in python, but can be in any language wrapped by a shell script, that writes text to stdout. That text is indexed by Splunk as event data.
Set up the scripted input by clicking on Settings->Data inputs->Scripts.
See https://docs.splunk.com/Documentation/Splunk/7.3.2/AdvancedDev/Scriptedinputsintro for more information.