topic Re: Create a new field and populate it with normalized data? in Splunk Search

Create a new field and populate it with normalized data?

dbcase — Fri, 08 Dec 2017 00:47:35 GMT

Hi,

I have a field in my existing data set called mso. Within that field are company names

Example

CompanyA
CompanyAA1
CompanyABB1
CoB
CoBBBB1
CompC
CompC2

What I would like to do is create a new field called normalized_mso with normalized company names. so that it looks like the below

CompanyA
CoB
CompC

How could I accomplish this?

Re: Create a new field and populate it with normalized data?

woodcock — Fri, 08 Dec 2017 00:57:44 GMT

You need to give us the rules for equation. For example:
All names start with and end with a single capital letter with all-lowercase between.
Everything after the 2nd (ending) capital letter is trash.

Re: Create a new field and populate it with normalized data?

dbcase — Fri, 08 Dec 2017 01:07:18 GMT

Well the names are kinda all over the map, thats why I made the examples variable.

Right now I'm matching using "like" and while it mostly works, what I really need is a new field in the existing data set.

Here is a better example

Company names (made up)

ExxonUSA
ExxonEurope
ExxonAPAC
ShellUSA
ShellSouthAmerica
ShellNZ
ChevronCalifornia
ChevronIceland
Chevron

I'd like to end up with
Exxon
Shell
Chevron

Re: Create a new field and populate it with normalized data?

niketn — Fri, 08 Dec 2017 02:55:04 GMT

@dbcase, if you can use the upper case character in second string as the pattern to end field extraction, you can use the rex command | rex field=<YourFieldNameGoesHere> "(?<mso_new>[A-Z]{1}[^A-Z]+)".
Following is a run anywhere search for you to try out:

| makeresults
| eval mso="ExxonUSA,ExxonEurope,ExxonAPAC,ShellUSA,ShellSouthAmerica,ShellNZ,ChevronCalifornia,ChevronIceland,Chevron"
| makemv mso delim="," 
| mvexpand mso
| rex field=mso "(?<mso_new>[A-Z]{1}[^A-Z]+)"

PS: The regular expression means start with Capital letter and extract all characters before find next Upper Case character. You can test the same on regex101.com as well.

Re: Create a new field and populate it with normalized data?

dbcase — Fri, 08 Dec 2017 04:11:38 GMT

Hi Niketnilay,

You are a busy man!!! Sadly the real data doesn't necessarily have an uppercase character.

Re: Create a new field and populate it with normalized data?

niketn — Fri, 08 Dec 2017 04:22:07 GMT

@dbcase 😄 LOL I am on leave today, travelling to hometown 🙂

If there is no pattern your option would be to use lookup with wildcard. Refer to this answer https://answers.splunk.com/answers/52580/can-we-use-wildcard-characters-in-a-lookup-table.html

Re: Create a new field and populate it with normalized data?

woodcock — Fri, 08 Dec 2017 06:09:58 GMT

I get the general idea but I need some rules of structure against which to code.

Re: Create a new field and populate it with normalized data?

DalJeanis — Fri, 08 Dec 2017 18:06:02 GMT

This works against your examples.

| makeresults
| eval mso="ExxonUSA,ExxonEurope,ExxonAPAC,ShellUSA,ShellSouthAmerica,ShellNZ,ChevronCalifornia,ChevronIceland,Chevron"
| makemv mso delim="," 
| mvexpand mso
| eval mso2=mso
| rex mode=sed field=mso2 "s/(^[A-Z]+?[a-z]+)([A-Z]*?[a-z]*?[a-zA-Z]*?$)/\1/g"

You may need to post any examples you have of companies that have numbers, special characters, or all lower case for testing. This works against a few more scenarios...

 | makeresults
 | eval mso="ExxonUSA,ExxonEurope,ExxonAPAC,ShellUSA,ShellSouthAmerica,ShellNZ,ChevronCalifornia,ChevronIceland,Chevron,blahblahTheblah,Protection1Green,327weirdness Inc,Iceberg Titanic Company"
 | makemv mso delim="," 
 | mvexpand mso
 | eval mso2=mso
 | rex mode=sed field=mso2 "s/^([^a-z]*)([a-z0-9]+)([^a-z0-9\n]*)([a-zA-Z0-9 ]*)$/\1\2/g"