- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to monitor docx and PDF files in Splunk?
Hi,
I think about new application for our organization and for that I need the ability to monitor (=index,read) the content of doc / docx / PDF files.
When I import the file to Splunk it preview like hex / binary so I think we should define new sourcetype for those files and especially change the charset to something that fit it.
I searched a lot about it but seems that anyone deals with this before.
Can you help me with this?
Thanks,
Omer.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi omerr,
Are you trying to use splunk to search within your docx / PDF or simply store it?
Either way, splunk doesn't provide a default way to handle this.
You could use a script in combination with some kind of docx / pdf to text utility to load your docx / PDF's textual content into splunk.
Or, ...
If you want to try simply indexing the files straight-up, then simply add something like this to your props.conf file:
[source::....pdf]
NO_BINARY_CHECK = true
Which should force your PDFs to be indexed even though they are binary. I suspect you will not like the results, but you can give it a try.
cheers, MuS
