Splunk Dev

How to extract data from PDF file?

maangellamatini
Explorer

I'm new to Phantom and would like to know how I could extract data from a PDF file attached to an email. From my understanding, the workflow goes like this: email gets sent to mailbox, phantom ingests email, phantom then creates a vault artifact.

Is it possible to:

  • get the pdf file and read it's text
  • determine important data in the pdf such as for example, IP addresses, URLs

and how?

Labels (1)
Tags (3)
0 Karma
1 Solution

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

View solution in original post

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

maangellamatini
Explorer

Thank you, rgresham! This was extremely helpful.

0 Karma

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App link text.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...