Splunk Search

How to do field Extraction for Complex Data Structure?

SplunkDash
Motivator

Hello,

I have source files with very inconsistent/ complex events/data structure. I wrote field extraction (inline) codes which are working for most of the cases, however not extracting field as expected for some cases. I included 3 sample events and my inline field extraction codes. Ayn help will be highly appreciated. Thank you!

Three Sample Events

June 10, 2021 10:41:39:993-0400 - INFO: 439749134|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439749134|4deef81s-6455-460b-bf41-c126700d1e9d|2607:fb91:118e:89c9:ad53:43b0:ccce:417c|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:41:39:993-0400  EDT 2021^iat= June 10, 2021 10:41:39:993-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

June 10, 2021 10:42:36:991-0400 - INFO: 439741123|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439741123|4deef81s-6455-460b-bf41-c126700d1e9d|65.115.214.106|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:42:36:991-0400  EDT 2021^iat= June 10, 2021 10:42:36:991-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

May 03, 2021 10:33:50:223-0400 - INFO: NON-8016|IdtokenAuth||authenticate‖lookupClaimVal is null|ERROR|SITEMINDER| QDIAUTH|vp22wsnnn012 |null|null|

 

My Inline field extraction codes: (Working for first 2 events but not the 3rd event)

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w+)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w+)\|(?P<SUBJECT>\w+)\|(?P<LESSION>\w+?\-?\w+?\-?\w+?\-?\w+?-\w+?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Does this help?

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w*)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w*)\|(?P<SUBJECT>\w*)\|(?P<LESSION>\w*?\-?\w*?\-?\w*?\-?\w*?\-?\w*?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

By the way, the pasting of the third message may have been corrupted and I have assumed that there should be 4 pipes in the middle

authenticate||||lookupClaimVal is null

It is often clearer to paste events etc into code blocks to avoid spurious substitutions being made!

SplunkDash
Motivator

Hello,

Thank you so much for your quick response, truly appreciate it. I think we don't have a better choice based on the quality of data. Thank you again.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...