I am trying to extract and normalize some phone numbers that are appearing in inconsistent ways. Below I attempted to recreate a realistic example of what my data looks like. It contains multi values, special characters and numbers of varying lengths. I would prefer to do this at search time in my props.conf / transforms.
Ideally I'd like to use something similar to a transforms statement that says, start at a quotation mark, read all digits, stop at the next quotation mark.
I had considered doing this the with the following config but it appears to not be able to handle multivalued fields. Could I please get some suggestions on how to correct my config or a more efficient way to go about this?
EXTRACT-my_stanza EVAL-clean_numbers = replace(phone_number, "\D", "")
[my_stanza] SOURCE_KEY = REGEX = \"(?\d+[^\"]) MV_ADD = true
"223-456 0002","(223)456-0003 1234" "223-456 0101","223-456-0102"
1234560005 1234560006 1234560007
You need to realize that field extractions may only contain contiguous substrings of the
_raw field; it is not possible to extract fields where characters in the middle are dropped, nor where characters anywhere are modified.
Entirely new fields may be created with calcluated fields or with SPL inside of a search that do those things (both are search-time operations) but since this would require multiple
eval calls in sequence, and the
EVAL parser processes all lines in any
props.conf in parallel we cannot use that option. So here is the only way to do it:
[phone_numbers] REGEX = "([^"]+) FORMAT = phone_numbers::$1 MV_ADD = true
To fully normalize, you will need to clean the extra punctuation from inside your search like this:
... | rex field=phone_numbers mode=sed "s/[()\-\s]//g"
I appreciate your comment.
The problem is your suggestion requires multiple eval steps and calculated fields are all executed in parallel when entered into props.conf.
I had done something pretty similar to your Rex mode-sed option which works fine - the only problem is 1 - I was hoping to simplify this for my users and 2 - I was hoping for a more efficient method that didn't require pulling the data into memory.
Again, thank you for responding to my question.
"All EVAL- configurations within a single props.conf stanza are processed in parallel, rather than in any particular sequence. This means you can't "chain" calculated field expressions, where the evaluation of one calculated field is used in the expression for another calculated field.
Calculated fields can reference all types of field extractions as well as field aliases. They cannot reference lookups, event types, or tags. "
Hm; when did that happen? I could have sworn that it used to be top-to-bottom serially but the dox are clear. I will update my answer according to: