Getting Data In

Ingesting Word document

timrich66
Communicator

Hello all,

My latest challenge is to ingest a Word doc into our environment.  According to everything I have read so far, this should be straight forward as Splunk can ingest 'any' file.  At this point I should point out that I am not concerned about the contents of the file (as this all needs to be obfuscated).  I only need to ingest the file to get its name.  I am not concerned about whether or not Splunk can read the 'Word' type formatting.

The file is created daily with the format - "My Word Doc ddmmyyyy hh mm.doc"

I am only interested in the "ddmmyyyy hh mm" part to ensure that it has been created today.

I cannot get the doc file to ingest at all.  Not even in an unformatted state.  If I save the file as a ".txt" file, then it is ingested.  Unfortunately, the 'save as' option is not an option in production.

I have tried using 'whitelist=' option without any success.

Can anyone suggest a solution?  Is there something in my installation that is stopping Word docs from being ingested?  Has anyone else had a similar experience?  

Thanks

Labels (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Ingesting an entire Word package, possibly several MB, just to find out if a file exists seems wasteful to me.

As I suggested earlier, consider a script to test for the presence of the file and report to Splunk.

---
If this reply helps you, Karma would be appreciated.
0 Karma

timrich66
Communicator

The file is tiny.  I will look at what other options are available. Thanks

0 Karma

richgalloway
SplunkTrust
SplunkTrust
Your initial assumption is faulty. Splunk cannot ingest *any* file. It can, however, ingest any *text* file. Word files are not text.
Consider writing a python script to test for the presence of the file and making it a scripted input.
---
If this reply helps you, Karma would be appreciated.
0 Karma

timrich66
Communicator

Thanks for the reply.  I am, however, still confused.  There are a number of Questions about how to ingest with the correct format - eg https://community.splunk.com/t5/Archive/How-to-ingest-doc-format-file-into-splunk-with-correct-forma...

As I have stated, I am not concerned with the format within the doc, only the filename is of importance.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...