Getting Data In

How do I get the parameter from XML, PDf or CSV file by using Splunk?

New Member

How do add xml or pdf or csv file into Splunk and get the value from these file by using Splunk?

0 Karma

Esteemed Legend

You cannot add PDFs, without first passing the events through a binary decoder. If you can find one that works from the CLI, then you can look at how Splunk handles *.zip files (google unarchive_cmd) and do it the same way. As far as CSVs or XMLs, you can use INDEXED_EXTRACTIONS feature (google that).

0 Karma

Contributor

Hi Jov,
XML and csv are both straight forward,

To add data, click the green Add Data button (to the right of the list of apps.) > selet upload and then browse the docs to be uploaded.

For PDF's i dont think there is something available from Splunk directly.
One thing we can do is to use tools which can convert PDF to text file and then add monitoring on it.
OR enable NO_BINARY_CHECK to true in props.conf, Which would force PDF files to be indexed even though they are binary.

0 Karma

New Member

Ok, but my xml is contain too many level. Besides, it cant extract the field correctly and accurate. Is that has any solution?

0 Karma

SplunkTrust
SplunkTrust

XML and CVS files are plain text and easily ingested by Splunk. PDF files are not text and cannot be ingested by Splunk without some kind of pre-processing.
We'll need to hear more about what you want to to do with these files to offer specific advise. Are these one-time adds or will the files be monitored for changes?

---
If this reply helps you, an upvote would be appreciated.
0 Karma

New Member

Ok, but my xml is contain too many level. Besides, it cant extract the field correctly and accurate. Is that has any solution?

0 Karma

SplunkTrust
SplunkTrust

Consider a scripted input that pre-parses the XML.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

New Member

Could u tell me more about the scipted things? is it has any tutorial?

0 Karma

SplunkTrust
SplunkTrust

A scripted input is a program, usually written in python, but can be in any language wrapped by a shell script, that writes text to stdout. That text is indexed by Splunk as event data.
Set up the scripted input by clicking on Settings->Data inputs->Scripts.
See https://docs.splunk.com/Documentation/Splunk/7.3.2/AdvancedDev/Scriptedinputsintro for more information.

---
If this reply helps you, an upvote would be appreciated.
0 Karma