Developing for Splunk Enterprise

How to extract data from PDF file?

I'm new to Phantom and would like to know how I could extract data from a PDF file attached to an email. From my understanding, the workflow goes like this: email gets sent to mailbox, phantom ingests email, phantom then creates a vault artifact.

Is it possible to:

  • get the pdf file and read it's text
  • determine important data in the pdf such as for example, IP addresses, URLs

and how?

Labels (1)
Tags (3)
0 Karma
1 Solution

Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

View solution in original post

Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

View solution in original post

Thank you, rgresham! This was extremely helpful.

0 Karma

Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App link text.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

0 Karma