Developing for Splunk Enterprise

How to extract data from PDF file?

maangellamatini
Explorer

I'm new to Phantom and would like to know how I could extract data from a PDF file attached to an email. From my understanding, the workflow goes like this: email gets sent to mailbox, phantom ingests email, phantom then creates a vault artifact.

Is it possible to:

  • get the pdf file and read it's text
  • determine important data in the pdf such as for example, IP addresses, URLs

and how?

Labels (1)
Tags (3)
0 Karma
1 Solution

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

View solution in original post

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

View solution in original post

maangellamatini
Explorer

Thank you, rgresham! This was extremely helpful.

0 Karma

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App link text.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.