Retrieval Information from Unstructured Documents

Splunk Community

Retrieval Information from Unstructured Documents

Retrieval Information from Unstructured Documents
a simple way to check the presence of certain information within a set of documents present in a folder. Documents can be provided in any form (Office, PDF, OpenOffice, etc.). The app searches, extracts and indexes information such as: • email addresses • tax identification codes (default: italian format) • telephone numbers • names / entities • bank account numbers (default: italian format) • postal addresses (default: italian format) Use a Python script to invoke Apache Tika libraries, apply regex rules to identify the information and send to Splunk HEC to ingest only that information, avoiding to recording in Splunk the rest of documents contents.
0 topics and 0 replies mentioned Retrieval Information from Unstructured Documents in
Latest Topics
No posts to display.
Latest Replies
No posts to display.
Top Topics
No posts to display.
My Topics
No posts to display.