Splunk Search

How to do field Extraction for Complex Data Structure?

SplunkDash
Motivator

Hello,

I have source files with very inconsistent/ complex events/data structure. I wrote field extraction (inline) codes which are working for most of the cases, however not extracting field as expected for some cases. I included 3 sample events and my inline field extraction codes. Ayn help will be highly appreciated. Thank you!

Three Sample Events

June 10, 2021 10:41:39:993-0400 - INFO: 439749134|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439749134|4deef81s-6455-460b-bf41-c126700d1e9d|2607:fb91:118e:89c9:ad53:43b0:ccce:417c|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:41:39:993-0400  EDT 2021^iat= June 10, 2021 10:41:39:993-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

June 10, 2021 10:42:36:991-0400 - INFO: 439741123|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439741123|4deef81s-6455-460b-bf41-c126700d1e9d|65.115.214.106|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:42:36:991-0400  EDT 2021^iat= June 10, 2021 10:42:36:991-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

May 03, 2021 10:33:50:223-0400 - INFO: NON-8016|IdtokenAuth||authenticate‖lookupClaimVal is null|ERROR|SITEMINDER| QDIAUTH|vp22wsnnn012 |null|null|

 

My Inline field extraction codes: (Working for first 2 events but not the 3rd event)

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w+)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w+)\|(?P<SUBJECT>\w+)\|(?P<LESSION>\w+?\-?\w+?\-?\w+?\-?\w+?-\w+?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Does this help?

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w*)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w*)\|(?P<SUBJECT>\w*)\|(?P<LESSION>\w*?\-?\w*?\-?\w*?\-?\w*?\-?\w*?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

By the way, the pasting of the third message may have been corrupted and I have assumed that there should be 4 pipes in the middle

authenticate||||lookupClaimVal is null

It is often clearer to paste events etc into code blocks to avoid spurious substitutions being made!

SplunkDash
Motivator

Hello,

Thank you so much for your quick response, truly appreciate it. I think we don't have a better choice based on the quality of data. Thank you again.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to July Tech Talks, Office Hours, and Webinars!

What are Community Office Hours?Community Office Hours is an interactive 60-minute Zoom series where ...

Updated Data Type Articles, Anniversary Celebrations, and More on Splunk Lantern

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

A Prelude to .conf25: Your Guide to Splunk University

Heading to Boston this September for .conf25? Get a jumpstart by arriving a few days early for Splunk ...