Splunk Search

Field Extraction with Inconsistence Data Structure

SplunkDash
Motivator

Hello,

I have some issues with Field Extraction, since there are some inconsistences in the structure of its field values. If we look at the following 2 sample events:  Amt, outputCd, and returnCd are null in one event and have some values for other event, and also values are within " " . I used following extraction codes which work fine (separately) with null and Values. But we can only use one extraction code to extract field values from the same field. Are there any ways I can write One field extraction code that will satisfy both conditions? Thank you so much, any help will be highly appreciated:

Field Extraction Code:

outputCd":(?P<outputCd>\w*)  [work with null]

Amt":"(?P<Amt>\w*)                      [work with values]

 

Sample Events

"timeStamp":"2021-12-09 08:55:30 EST","appName":"DEV","userType":"DBA","caseStatCd":null,"Amt":"100","errorMsg":null,"eventId":"VIEW_LIST_RESPONSE","eventType":"PENDING","fileSourceCd":null, "mftCd":null,"outputCd":null,"planNum":null,"reasonCd":null,"returnCd":null,"sessionId":"acMgt/dev” , "Period":”2021”, userId":"28f526d4-3464-4766-DBA "

"timeStamp":"2021-12-09 08:55:32 EST","appName":"SYS","userType":"ADM","caseStatCd":null,"Amt":null,"errorMsg":null,"eventId":"VIEW_LIST","eventType":"PENDING","fileSourceCd":”09”, "mftCd":null,"outputCd":"09","planNum":null,"reasonCd":null,"returnCd":”01”,"sessionId":"acMgt/dev” , "Period":null, userId":"28f526d4-3464-4766-ADM"

Labels (1)
Tags (1)
0 Karma
1 Solution

SplunkDash
Motivator

Hello,

Thank you so much again....just had to do a little tweak of your code "outputCd":\"?(?P<outputCd>\"?\w*) and working as expected. Thank you, appreciated!

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Allow for optional double quotes

| rex "outputCd\":(?P<outputCd>\"?\w*\"?)"
0 Karma

SplunkDash
Motivator

Hello,

Thank you so much, appreciated. Yes,  your "outputCd\":(?P<outputCd>\"?\w*\"?)" is working fine with "outputCd":"09",  but giving no output for "outputCd": null, thank you again!

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You appear to have an extra space between the : and null - is that a typo or only there some of the time or all of the time?

SplunkDash
Motivator

Hello,

Thank you so much again....just had to do a little tweak of your code "outputCd":\"?(?P<outputCd>\"?\w*) and working as expected. Thank you, appreciated!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Mind you that this will also allow for any string not surrounded by quotes, not just null. But that on its own might not be that big of a problem, but it will also not work in general with any quotes delimited string possibly containing escaped quotes.

0 Karma

SplunkDash
Motivator

Hello, thank you so much… then what  would you think the  right way to do this field extraction in this case?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Your sample events do not have embedded quotes, they also do not appear to have embedded spaces. There is no generalised solution that works for every possibility. It depends on your data and what it is that you are trying to extract from it. The more complex the solution, the longer it is likely to take, so often the minimum viable solution is the way to go (until it no longer works when the data changes). 😀

PickleRick
SplunkTrust
SplunkTrust

There are usually two approaches you can take:

  1.  Account for all possible syntactically correct situations
  2. Just make minimum viable solution and adjust it possibly in the future if the one you have doesn't work anymore.

Both have pros and cons.

It's not that any of them is each time better than the other. It depends on what you're doing, where you're doing it and so on. One thing to take into account is that if you do something "wrong" in search-time, you can easily "fix" it later (I'm not sure how accelerated summaries react to that though) but if you're doing index-time extraction you can't "add" fields after the events have already been indexed.

And as to the pattern itself... well, it's a bit tricky because I can't find a way to consume the quotes but not return them in a match. So you could do something like

"field":(?<value>\w+|"(?:[^"]|\\")*")

but that would capture the value of the field with the quotes. I don't know of any way to get rid of them without any postprocessing. (OK, maybe the conditional features of PCRE could allow that but that's ridiculous to write).

0 Karma

SplunkDash
Motivator

Hello, thank you so much, appreciate it. However, facing 2 issues the output comes with " " please see below:

 

malekmo_0-1639575784340.png

 

 and it doesn't work for "outputCd": ,    [when no values]

0 Karma

PickleRick
SplunkTrust
SplunkTrust

PickleRick_0-1639576049196.png

PickleRick_1-1639576085527.png

If you want it to match no value at all (immediate comma), change \w+ to \w*

 

0 Karma

SplunkDash
Motivator

Thank you again. I tried it with w*, but it doesn't work with "outputCd":"09", and I also wanted to avoid " " (double quotation) from the output.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults
| eval events=split("\"timeStamp\":\"2021-12-09 08:55:30 EST\",\"appName\":\"DEV\",\"userType\":\"DBA\",\"caseStatCd\":null,\"Amt\":\"100\",\"errorMsg\":null,\"eventId\":\"VIEW_LIST_RESPONSE\",\"eventType\":\"PENDING\",\"fileSourceCd\":null, \"mftCd\":null,\"outputCd\":null,\"planNum\":null,\"reasonCd\":null,\"returnCd\":null,\"sessionId\":\"acMgt/dev” , \"Period\":”2021”, userId\":\"28f526d4-3464-4766-DBA \"|\"timeStamp\":\"2021-12-09 08:55:32 EST\",\"appName\":\"SYS\",\"userType\":\"ADM\",\"caseStatCd\":null,\"Amt\":null,\"errorMsg\":null,\"eventId\":\"VIEW_LIST\",\"eventType\":\"PENDING\",\"fileSourceCd\":”09”, \"mftCd\":null,\"outputCd\":\"09\",\"planNum\":null,\"reasonCd\":null,\"returnCd\":”01”,\"sessionId\":\"acMgt/dev” , \"Period\":null, userId\":\"28f526d4-3464-4766-ADM\"|\"timeStamp\":\"2021-12-09 08:55:32 EST\",\"appName\":\"SYS\",\"userType\":\"ADM\",\"caseStatCd\":null,\"Amt\":null,\"errorMsg\":null,\"eventId\":\"VIEW_LIST\",\"eventType\":\"PENDING\",\"fileSourceCd\":”09”, \"mftCd\":null,\"outputCd\":,\"planNum\":null,\"reasonCd\":null,\"returnCd\":”01”,\"sessionId\":\"acMgt/dev” , \"Period\":null, userId\":\"28f526d4-3464-4766-ADM\"|\"timeStamp\":\"2021-12-09 08:55:32 EST\",\"appName\":\"SYS\",\"userType\":\"ADM\",\"caseStatCd\":null,\"Amt\":null,\"errorMsg\":null,\"eventId\":\"VIEW_LIST\",\"eventType\":\"PENDING\",\"fileSourceCd\":”09”, \"mftCd\":null,\"outputCd\": null,\"planNum\":null,\"reasonCd\":null,\"returnCd\":”01”,\"sessionId\":\"acMgt/dev” , \"Period\":null, userId\":\"28f526d4-3464-4766-ADM\"","|")
| mvexpand events
| rex field=events "eventType\": ?\"?(?P<eventType>\w*)\"?\,?.*outputCd\": ?\"?(?P<outputCd>\w*)\"?\,?"

SplunkDash
Motivator

Yes, working as expected 🙂, thank you so much, truly appreciated!!!

0 Karma

SplunkDash
Motivator

Makes sense, thank you again 😊!

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...