Splunk Dev

How to extract data from PDF file?

maangellamatini
Explorer

I'm new to Phantom and would like to know how I could extract data from a PDF file attached to an email. From my understanding, the workflow goes like this: email gets sent to mailbox, phantom ingests email, phantom then creates a vault artifact.

Is it possible to:

  • get the pdf file and read it's text
  • determine important data in the pdf such as for example, IP addresses, URLs

and how?

Labels (1)
Tags (3)
0 Karma
1 Solution

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

View solution in original post

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

maangellamatini
Explorer

Thank you, rgresham! This was extremely helpful.

0 Karma

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App link text.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

0 Karma
Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...