Building for the Splunk Platform

How to extract data from PDF file?

maangellamatini
Explorer

I'm new to Phantom and would like to know how I could extract data from a PDF file attached to an email. From my understanding, the workflow goes like this: email gets sent to mailbox, phantom ingests email, phantom then creates a vault artifact.

Is it possible to:

  • get the pdf file and read it's text
  • determine important data in the pdf such as for example, IP addresses, URLs

and how?

Labels (1)
Tags (3)
0 Karma
1 Solution

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

View solution in original post

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App Phantom Parser App link.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

maangellamatini
Explorer

Thank you, rgresham! This was extremely helpful.

0 Karma

rgresham_splunk
Splunk Employee
Splunk Employee

The best way to get IOCs out of a PDF is to use the Phantom Parser App link text.

This app will require the file to be in the vault or file location of the platform. Normally if ingesting via email, pdf attachments are automatically attached to the File/Vault location. Then you will need a vaultId from the File Artifact or Vault Artifact to send to the parser for it to extract the IOCs just like we do in emails.

I hope this helps.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...