Splunk Search

How to do field Extraction for Complex Data Structure?

SplunkDash
Motivator

Hello,

I have source files with very inconsistent/ complex events/data structure. I wrote field extraction (inline) codes which are working for most of the cases, however not extracting field as expected for some cases. I included 3 sample events and my inline field extraction codes. Ayn help will be highly appreciated. Thank you!

Three Sample Events

June 10, 2021 10:41:39:993-0400 - INFO: 439749134|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439749134|4deef81s-6455-460b-bf41-c126700d1e9d|2607:fb91:118e:89c9:ad53:43b0:ccce:417c|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:41:39:993-0400  EDT 2021^iat= June 10, 2021 10:41:39:993-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

June 10, 2021 10:42:36:991-0400 - INFO: 439741123|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439741123|4deef81s-6455-460b-bf41-c126700d1e9d|65.115.214.106|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:42:36:991-0400  EDT 2021^iat= June 10, 2021 10:42:36:991-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

May 03, 2021 10:33:50:223-0400 - INFO: NON-8016|IdtokenAuth||authenticate‖lookupClaimVal is null|ERROR|SITEMINDER| QDIAUTH|vp22wsnnn012 |null|null|

 

My Inline field extraction codes: (Working for first 2 events but not the 3rd event)

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w+)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w+)\|(?P<SUBJECT>\w+)\|(?P<LESSION>\w+?\-?\w+?\-?\w+?\-?\w+?-\w+?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Does this help?

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w*)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w*)\|(?P<SUBJECT>\w*)\|(?P<LESSION>\w*?\-?\w*?\-?\w*?\-?\w*?\-?\w*?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

By the way, the pasting of the third message may have been corrupted and I have assumed that there should be 4 pipes in the middle

authenticate||||lookupClaimVal is null

It is often clearer to paste events etc into code blocks to avoid spurious substitutions being made!

SplunkDash
Motivator

Hello,

Thank you so much for your quick response, truly appreciate it. I think we don't have a better choice based on the quality of data. Thank you again.

0 Karma
Get Updates on the Splunk Community!

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...