<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Regex extraction advice for phone numbers - Replace for multivalued fields in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288009#M87198</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I appreciate your comment. &lt;/P&gt;

&lt;P&gt;The problem is your suggestion requires multiple eval steps and calculated fields are all executed in parallel when entered into props.conf. &lt;/P&gt;

&lt;P&gt;I had done something pretty similar to your Rex mode-sed option which works fine - the only problem is 1 - I was hoping to simplify this for my users and 2 - I was hoping for a more efficient method that didn't require pulling the data into memory. &lt;/P&gt;

&lt;P&gt;Again, thank you for responding to my question.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/6.5.2/SearchReference/CommonEvalFunctions"&gt;https://docs.splunk.com/Documentation/Splunk/6.5.2/SearchReference/CommonEvalFunctions&lt;/A&gt;&lt;BR /&gt;
"All EVAL- configurations within a single props.conf stanza are processed in parallel, rather than in any particular sequence. This means you can't "chain" calculated field expressions, where the evaluation of one calculated field is used in the expression for another calculated field.&lt;/P&gt;

&lt;P&gt;Calculated fields can reference all types of field extractions as well as field aliases. They cannot reference lookups, event types, or tags. "&lt;/P&gt;</description>
    <pubDate>Mon, 27 Mar 2017 18:27:46 GMT</pubDate>
    <dc:creator>jhall0007</dc:creator>
    <dc:date>2017-03-27T18:27:46Z</dc:date>
    <item>
      <title>Regex extraction advice for phone numbers - Replace for multivalued fields</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288007#M87196</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I am trying to extract and normalize some phone numbers that are appearing in inconsistent ways. Below I attempted to recreate a realistic example of what my data looks like. It contains multi values, special characters and numbers of varying lengths. I would prefer to do this at search time in my props.conf / transforms.&lt;/P&gt;

&lt;P&gt;Ideally I'd like to use something similar to a transforms statement that says, start at a quotation mark, read all digits, stop at the next quotation mark. &lt;/P&gt;

&lt;P&gt;I had considered doing this the with the following config but it appears to not be able to handle multivalued fields. Could I please get some suggestions on how to correct my config or a more efficient way to go about this?&lt;/P&gt;

&lt;P&gt;In props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;EXTRACT-my_stanza
EVAL-clean_numbers = replace(phone_number, "\D", "")
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In transforms.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[my_stanza]
SOURCE_KEY = 
REGEX = \"(?\d+[^\"])
MV_ADD = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Examples:&lt;/P&gt;

&lt;P&gt;Log 1: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"(223) 456-0001"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Log 2:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"223-456 0002","(223)456-0003 1234"
"223-456 0101","223-456-0102" 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Log 3:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"223-456-0004"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Log 4:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"234560005","(223)4560006","223-456-0007"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Log 5:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"1223456-0008"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Desired results:&lt;/P&gt;

&lt;P&gt;Log 1:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1234560001
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Log 2:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1234560002
1234560003
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Log 3:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1234560004
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Log 4:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1234560005
1234560006
1234560007
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Log 5:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1234560008
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 24 Mar 2017 18:44:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288007#M87196</guid>
      <dc:creator>jhall0007</dc:creator>
      <dc:date>2017-03-24T18:44:07Z</dc:date>
    </item>
    <item>
      <title>Re: Regex extraction advice for phone numbers - Replace for multivalued fields</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288008#M87197</link>
      <description>&lt;P&gt;You need to realize that field extractions may only contain contiguous substrings of the &lt;CODE&gt;_raw&lt;/CODE&gt; field; it is not possible to extract fields where characters in the middle are dropped, nor where characters anywhere are modified.&lt;/P&gt;

&lt;P&gt;Entirely &lt;EM&gt;new&lt;/EM&gt; fields may be created with calcluated fields or with SPL inside of a search that do those things (both are search-time operations) but since this would require multiple &lt;CODE&gt;eval&lt;/CODE&gt; calls in sequence, and the &lt;CODE&gt;EVAL&lt;/CODE&gt; parser processes all lines in any &lt;CODE&gt;props.conf&lt;/CODE&gt; in parallel we cannot use that option.  So here is the only way to do it:&lt;/P&gt;

&lt;P&gt;In props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;REPORT-phone_numbers
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In transforms.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[phone_numbers]
REGEX = "([^"]+)
FORMAT = phone_numbers::$1
MV_ADD = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;To fully normalize, you will need to clean the extra punctuation from inside your search like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | rex field=phone_numbers mode=sed "s/[()\-\s]//g"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 25 Mar 2017 07:14:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288008#M87197</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-03-25T07:14:42Z</dc:date>
    </item>
    <item>
      <title>Re: Regex extraction advice for phone numbers - Replace for multivalued fields</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288009#M87198</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I appreciate your comment. &lt;/P&gt;

&lt;P&gt;The problem is your suggestion requires multiple eval steps and calculated fields are all executed in parallel when entered into props.conf. &lt;/P&gt;

&lt;P&gt;I had done something pretty similar to your Rex mode-sed option which works fine - the only problem is 1 - I was hoping to simplify this for my users and 2 - I was hoping for a more efficient method that didn't require pulling the data into memory. &lt;/P&gt;

&lt;P&gt;Again, thank you for responding to my question.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/6.5.2/SearchReference/CommonEvalFunctions"&gt;https://docs.splunk.com/Documentation/Splunk/6.5.2/SearchReference/CommonEvalFunctions&lt;/A&gt;&lt;BR /&gt;
"All EVAL- configurations within a single props.conf stanza are processed in parallel, rather than in any particular sequence. This means you can't "chain" calculated field expressions, where the evaluation of one calculated field is used in the expression for another calculated field.&lt;/P&gt;

&lt;P&gt;Calculated fields can reference all types of field extractions as well as field aliases. They cannot reference lookups, event types, or tags. "&lt;/P&gt;</description>
      <pubDate>Mon, 27 Mar 2017 18:27:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288009#M87198</guid>
      <dc:creator>jhall0007</dc:creator>
      <dc:date>2017-03-27T18:27:46Z</dc:date>
    </item>
    <item>
      <title>Re: Regex extraction advice for phone numbers - Replace for multivalued fields</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288010#M87199</link>
      <description>&lt;P&gt;Hm; when did that happen?  I could have sworn that it used to be top-to-bottom serially but the dox are clear.  I will update my answer according to:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/definecalcfields"&gt;https://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/definecalcfields&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Apr 2017 14:03:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Regex-extraction-advice-for-phone-numbers-Replace-for/m-p/288010#M87199</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-04-04T14:03:11Z</dc:date>
    </item>
  </channel>
</rss>

