Hello,
I have some issues with Field Extraction, since there are some inconsistences in the structure of its field values. If we look at the following 2 sample events: Amt, outputCd, and returnCd are null in one event and have some values for other event, and also values are within " " . I used following extraction codes which work fine (separately) with null and Values. But we can only use one extraction code to extract field values from the same field. Are there any ways I can write One field extraction code that will satisfy both conditions? Thank you so much, any help will be highly appreciated:
Field Extraction Code:
outputCd":(?P<outputCd>\w*) [work with null]
Amt":"(?P<Amt>\w*) [work with values]
Sample Events
"timeStamp":"2021-12-09 08:55:30 EST","appName":"DEV","userType":"DBA","caseStatCd":null,"Amt":"100","errorMsg":null,"eventId":"VIEW_LIST_RESPONSE","eventType":"PENDING","fileSourceCd":null, "mftCd":null,"outputCd":null,"planNum":null,"reasonCd":null,"returnCd":null,"sessionId":"acMgt/dev” , "Period":”2021”, userId":"28f526d4-3464-4766-DBA "
"timeStamp":"2021-12-09 08:55:32 EST","appName":"SYS","userType":"ADM","caseStatCd":null,"Amt":null,"errorMsg":null,"eventId":"VIEW_LIST","eventType":"PENDING","fileSourceCd":”09”, "mftCd":null,"outputCd":"09","planNum":null,"reasonCd":null,"returnCd":”01”,"sessionId":"acMgt/dev” , "Period":null, userId":"28f526d4-3464-4766-ADM"
Hello,
Thank you so much again....just had to do a little tweak of your code "outputCd":\"?(?P<outputCd>\"?\w*) and working as expected. Thank you, appreciated!
Allow for optional double quotes
| rex "outputCd\":(?P<outputCd>\"?\w*\"?)"
Hello,
Thank you so much, appreciated. Yes, your "outputCd\":(?P<outputCd>\"?\w*\"?)" is working fine with "outputCd":"09", but giving no output for "outputCd": null, thank you again!
You appear to have an extra space between the : and null - is that a typo or only there some of the time or all of the time?
Hello,
Thank you so much again....just had to do a little tweak of your code "outputCd":\"?(?P<outputCd>\"?\w*) and working as expected. Thank you, appreciated!
Mind you that this will also allow for any string not surrounded by quotes, not just null. But that on its own might not be that big of a problem, but it will also not work in general with any quotes delimited string possibly containing escaped quotes.
Hello, thank you so much… then what would you think the right way to do this field extraction in this case?
Your sample events do not have embedded quotes, they also do not appear to have embedded spaces. There is no generalised solution that works for every possibility. It depends on your data and what it is that you are trying to extract from it. The more complex the solution, the longer it is likely to take, so often the minimum viable solution is the way to go (until it no longer works when the data changes). 😀
There are usually two approaches you can take:
Both have pros and cons.
It's not that any of them is each time better than the other. It depends on what you're doing, where you're doing it and so on. One thing to take into account is that if you do something "wrong" in search-time, you can easily "fix" it later (I'm not sure how accelerated summaries react to that though) but if you're doing index-time extraction you can't "add" fields after the events have already been indexed.
And as to the pattern itself... well, it's a bit tricky because I can't find a way to consume the quotes but not return them in a match. So you could do something like
"field":(?<value>\w+|"(?:[^"]|\\")*")
but that would capture the value of the field with the quotes. I don't know of any way to get rid of them without any postprocessing. (OK, maybe the conditional features of PCRE could allow that but that's ridiculous to write).
Hello, thank you so much, appreciate it. However, facing 2 issues the output comes with " " please see below:
and it doesn't work for "outputCd": , [when no values]
If you want it to match no value at all (immediate comma), change \w+ to \w*
Thank you again. I tried it with w*, but it doesn't work with "outputCd":"09", and I also wanted to avoid " " (double quotation) from the output.
| makeresults
| eval events=split("\"timeStamp\":\"2021-12-09 08:55:30 EST\",\"appName\":\"DEV\",\"userType\":\"DBA\",\"caseStatCd\":null,\"Amt\":\"100\",\"errorMsg\":null,\"eventId\":\"VIEW_LIST_RESPONSE\",\"eventType\":\"PENDING\",\"fileSourceCd\":null, \"mftCd\":null,\"outputCd\":null,\"planNum\":null,\"reasonCd\":null,\"returnCd\":null,\"sessionId\":\"acMgt/dev” , \"Period\":”2021”, userId\":\"28f526d4-3464-4766-DBA \"|\"timeStamp\":\"2021-12-09 08:55:32 EST\",\"appName\":\"SYS\",\"userType\":\"ADM\",\"caseStatCd\":null,\"Amt\":null,\"errorMsg\":null,\"eventId\":\"VIEW_LIST\",\"eventType\":\"PENDING\",\"fileSourceCd\":”09”, \"mftCd\":null,\"outputCd\":\"09\",\"planNum\":null,\"reasonCd\":null,\"returnCd\":”01”,\"sessionId\":\"acMgt/dev” , \"Period\":null, userId\":\"28f526d4-3464-4766-ADM\"|\"timeStamp\":\"2021-12-09 08:55:32 EST\",\"appName\":\"SYS\",\"userType\":\"ADM\",\"caseStatCd\":null,\"Amt\":null,\"errorMsg\":null,\"eventId\":\"VIEW_LIST\",\"eventType\":\"PENDING\",\"fileSourceCd\":”09”, \"mftCd\":null,\"outputCd\":,\"planNum\":null,\"reasonCd\":null,\"returnCd\":”01”,\"sessionId\":\"acMgt/dev” , \"Period\":null, userId\":\"28f526d4-3464-4766-ADM\"|\"timeStamp\":\"2021-12-09 08:55:32 EST\",\"appName\":\"SYS\",\"userType\":\"ADM\",\"caseStatCd\":null,\"Amt\":null,\"errorMsg\":null,\"eventId\":\"VIEW_LIST\",\"eventType\":\"PENDING\",\"fileSourceCd\":”09”, \"mftCd\":null,\"outputCd\": null,\"planNum\":null,\"reasonCd\":null,\"returnCd\":”01”,\"sessionId\":\"acMgt/dev” , \"Period\":null, userId\":\"28f526d4-3464-4766-ADM\"","|")
| mvexpand events
| rex field=events "eventType\": ?\"?(?P<eventType>\w*)\"?\,?.*outputCd\": ?\"?(?P<outputCd>\w*)\"?\,?"
Yes, working as expected 🙂, thank you so much, truly appreciated!!!
Makes sense, thank you again 😊!