All Apps and Add-ons
Highlighted

Splunk for IMAP and indexing attachments

Splunk Employee
Splunk Employee

Does anyone know if the Splunk for IMAP app support indexing attachments?

For example, when setting up the Splunk for IMAP app, if I configure my imap.conf to include the following option settings:

includeBody = True
mimeTypes = text/plain,text/pdf

I'm wondering if this will actually work and index the body of the email message AND the attached PDF file contents as well. Perhaps I am misunderstanding what this config option is doing.

In any case, your insight with using the IMAP like this, or any other solution that you may have created to index both the email message content AND the attached messages, would be greatly appreciated.

0 Karma
Highlighted

Re: Splunk for IMAP and indexing attachments

Splunk Employee
Splunk Employee

I'm kind of unclear what it would mean to index a pdf. Would you expect the pdf to be unpacked with strings, pdf specific code, or some sort of OCR approach? Or would you want the raw binary blob to be stored in an event?

Pretty sure this is not accomplished by default. Feel free to look into calling pdf2txt on the attachments. The code is open for hacking!


As I understand your question, the answer is no. The IMAP app doesn't have a means to automatically convert any possible attachment to text for Splunk to index.

Highlighted

Re: Splunk for IMAP and indexing attachments

Splunk Employee
Splunk Employee

I am using pdf type as an example and the question is more about indexing ANY attachment type. In other words, does that option cited in my question, when used, mean that the app will index said attachment if I list the mime type correctly? If not, is there another way to index the content of one or more attachments?

0 Karma
Highlighted

Re: Splunk for IMAP and indexing attachments

SplunkTrust
SplunkTrust

It depends on the attachment, and how you can coerce it into plain text. MS Office documents, for example, aren't plain text at all - you would need to hack in something that understands that particular binary format and can emit plain text from it. Otherwise, you'd (substantially) be loading junk into your index.

0 Karma