Hello, I am looking to create a new field based on a section from a longer string/web address. I didn't see what i was looking in a search on this site. I assume its possible with a regex but I am not good at those.
Here are examples of fields I currently have... I am looking to extract the Location part of the string. The Company\Segment\Name does not change before Location. Location will change and is what I need.
\Company\Segment\Name\Location\Area
\Company\Segment\Name\Location\Area
I do not need the Area.
Thanks,
I'm not complete sure how your location is configured so I made some assumptions.
First I created a CSV file to test with and then created a lookup table:
location_number, location_path
1,"\Oshkosh Corporation\Defense\Oshkosh Defense\New York, NY\Staten Island"
2,"\Oshkosh Corporation\Defense\Oshkosh Defense\Atlanta, GA\Bankhead"
3,"\Oshkosh Corporation\Defense\Oshkosh Defense\Atlanta, GA\Peachtree Center"
4,"\Oshkosh Corporation\Defense\Oshkosh Defense\Seattle, WA\Pier"
Then using the query below I was able to get the rex working--turns out I needed 3 backslashes:
| inputlookup test_urls.csv | rex field=location_path "\\\Oshkosh Corporation\\\Defense\\\Oshkosh Defense\\\(?<location>\w+.*\,[\sA-Z]{3})" | table location
Output:
location |
New York, NY |
Atlanta, GA |
Atlanta, GA |
Seattle, WA |
Might need a little more information than you provided, but the below may work:
| rex field=url \\Company\\Segment\\Name\\(?<location>\w+)\\Area
This assumes the following are static: Company, Segment, Name, Area and creates a new field called 'location'
\w+ means match any word character (i.e., Matches any letter, digit or underscore. Equivalent to [a-zA-Z0-9_]), one or more times
You could certainly use regex on the entire string too and only return the location if you wanted.
I think this is probably very close to what I need. Area is not static however, it will change based on what location is but I won't need that value.
Also, if the field that the original string is in is called location_path. Does that replace "field" or "url" in the above code snippet?
And just so I am clear, that line of code should return what is the "location" part of the string and put it into its own field labeled as location?
Correct. If I did it correctly, this will create a field called 'location'. You could then view a list of 'locations' like so:
| table location
The 'url' would need to be replaced with whatever field name you have this data stored in.
To handle the 'area' being different, we'd want to vary the ending a little:
| rex field=url \\Company\\Segment\\Name\\(?<location>\w+)\\\w+.*
This will return only what was in the Location portion of your URL. Everything after the `\` would be ignored.
I get the following error with this but it seems on the verge of working.
"Error in 'rex' command: Encountered the following error while compiling the regex '\Company': Regex: unrecognized character follows \.
Does this potentially have to do with what actually fills spots between \ has spaces in it?
ex: \Company Name\Segment\Name Name\Location\Area
You have to escape the backslash with another backslash `\\`, this will process the `\` as a literal otherwise it will try to parse \C as a regex expression which it is not.
Also, I forgot the quotes:
| rex field=url "\\Company\\Segment\\Name\\(?<location>\w+)\\\w+.*"
I think this is close but I think it may be my error. Even with the " " I still get the following error using my actual data. I get myself confused trying to transfer to the data I am actually using. To be clear, I am trying to extract everything between the \ \ where location is.
Actual Input (location_path is the field the data is in.)
| rex field=location_path "\\Oshkosh Corporation\\Defense\\Oshkosh Defense\\(?<location>\w+)\\\w +.*"
Error Message
Error in 'rex' command: Encountered the following error while compiling the regex '\Oshkosh Corporation\Defense\Oshkosh Defense\(?<location>\w+)\\w +.*': Regex: unrecognized character follows \.
In your 'area' section, it looks like there is a space after the 'w', it should be `\w+.*`
I couldn't find it in the documentation, but you may have to escape the white spaces too. Either way, the last query below, may help. 🤞
| rex field=location_path "\\Oshkosh Corporation\\Defense\\Oshkosh Defense\\(?<location>\w+)\\\w+.*"
| rex field=location_path "\\Oshkosh\ Corporation\\Defense\\Oshkosh\ Defense\\(?<location>\w+)\\\w+.*"
\\\w+.*\s\w+.*\\\w+.*\\\w+.*\s\w+.*\\(?<location>\w+)\\\w+.*
I get the same error without the space.
I'm not complete sure how your location is configured so I made some assumptions.
First I created a CSV file to test with and then created a lookup table:
location_number, location_path
1,"\Oshkosh Corporation\Defense\Oshkosh Defense\New York, NY\Staten Island"
2,"\Oshkosh Corporation\Defense\Oshkosh Defense\Atlanta, GA\Bankhead"
3,"\Oshkosh Corporation\Defense\Oshkosh Defense\Atlanta, GA\Peachtree Center"
4,"\Oshkosh Corporation\Defense\Oshkosh Defense\Seattle, WA\Pier"
Then using the query below I was able to get the rex working--turns out I needed 3 backslashes:
| inputlookup test_urls.csv | rex field=location_path "\\\Oshkosh Corporation\\\Defense\\\Oshkosh Defense\\\(?<location>\w+.*\,[\sA-Z]{3})" | table location
Output:
location |
New York, NY |
Atlanta, GA |
Atlanta, GA |
Seattle, WA |
Yes! This worked for my data. Thank you for all the help.