Getting Data In

DB Connect: Blob object Search

JeremeyWise
Explorer

PreSales Question.

New(ish) to splunk, so RTFM (with link to FM) is fine.

Customer has splunk, want to link with DB connect into document image system. System images stored in database as blob objects. Some as PDF , some as scanned jpg. etc...

Goal is to splunk this data. Obviously some data stored in the database about the files are "good enough" for many lookups and reports but some times they will need to get data from the files themselves.

I can think of several ways to do this, pull file , feed through come third party OCR / PDF to txt processor then return values as file data in directory path against which then splunk would ingest. Not very elegant, and would require some API coding into applications to do OCR / conversion with trigger return to splunk to then start indexing data.

I have to believe someone else has cracked this cookie... Any ideas?

Thanks

Tags (1)

weeb
Splunk Employee
Splunk Employee

In order to use non-ASCII data in Splunk, it should first be converted into ASCII data. This can be done in SQL with CAST or CONVERT, but it may not be useful if the data needs to be compared later in the process, unless the exact same conversion algorithm and transformations are used on the data.

While there are some nifty hacks, I agree with the assessment that using an external tool is probably a better choice. It doesn't have to be a commercial one.

We've done some neat stuff with exiftool and the Bro TA, for instance.

http://www.sno.phy.queensu.ca/~phil/exiftool/

Splunk Add-on for Bro IDS
https://splunkbase.splunk.com/app/1617/

woodcock
Esteemed Legend

I would search through the apps at apps.splunk.com but are you sure Splunk is the right tool for this situation? Whenever people are working with documents, I usually suggest MarkLogic which has tools to help you generate the metadata that you are describing. It is an incredible product and does things in a totally different way than Splunk and is better suited for non-plain-text data sources:
http://www.MarkLogic.com/

P.S. These are the main guys that swooped in and make HealthCare.gov actually work; without them, it probably never would have.

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...