Solved: Re: Question for extracting a string out of longer...

strehb18 · ‎10-16-2020

Hello, I am looking to create a new field based on a section from a longer string/web address. I didn't see what i was looking in a search on this site. I assume its possible with a regex but I am not good at those.

Here are examples of fields I currently have... I am looking to extract the Location part of the string. The Company\Segment\Name does not change before Location. Location will change and is what I need.

\Company\Segment\Name\Location\Area

I do not need the Area.

Thanks,

chaday00 · ‎10-19-2020

I'm not complete sure how your location is configured so I made some assumptions.

First I created a CSV file to test with and then created a lookup table:

location_number, location_path
1,"\Oshkosh Corporation\Defense\Oshkosh Defense\New York, NY\Staten Island"
2,"\Oshkosh Corporation\Defense\Oshkosh Defense\Atlanta, GA\Bankhead"
3,"\Oshkosh Corporation\Defense\Oshkosh Defense\Atlanta, GA\Peachtree Center"
4,"\Oshkosh Corporation\Defense\Oshkosh Defense\Seattle, WA\Pier"

Then using the query below I was able to get the rex working--turns out I needed 3 backslashes:

| inputlookup test_urls.csv | rex field=location_path "\\\Oshkosh Corporation\\\Defense\\\Oshkosh Defense\\\(?<location>\w+.*\,[\sA-Z]{3})" | table location

Output:

location

New York, NY

Atlanta, GA

Seattle, WA

View solution in original post

chaday00 · ‎10-16-2020

Might need a little more information than you provided, but the below may work:

| rex field=url \\Company\\Segment\\Name\\(?<location>\w+)\\Area

This assumes the following are static: Company, Segment, Name, Area and creates a new field called 'location'

\w+ means match any word character (i.e., Matches any letter, digit or underscore. Equivalent to [a-zA-Z0-9_]), one or more times

You could certainly use regex on the entire string too and only return the location if you wanted.

strehb18 · ‎10-16-2020

I think this is probably very close to what I need. Area is not static however, it will change based on what location is but I won't need that value.

Also, if the field that the original string is in is called location_path. Does that replace "field" or "url" in the above code snippet?

And just so I am clear, that line of code should return what is the "location" part of the string and put it into its own field labeled as location?

chaday00 · ‎10-16-2020

Correct. If I did it correctly, this will create a field called 'location'. You could then view a list of 'locations' like so:

| table location

The 'url' would need to be replaced with whatever field name you have this data stored in.

To handle the 'area' being different, we'd want to vary the ending a little:

| rex field=url \\Company\\Segment\\Name\\(?<location>\w+)\\\w+.*

This will return only what was in the Location portion of your URL. Everything after the `\` would be ignored.

strehb18 · ‎10-16-2020

I get the following error with this but it seems on the verge of working.

"Error in 'rex' command: Encountered the following error while compiling the regex '\Company': Regex: unrecognized character follows \.

Does this potentially have to do with what actually fills spots between \ has spaces in it?

ex: \Company Name\Segment\Name Name\Location\Area

chaday00 · ‎10-16-2020

You have to escape the backslash with another backslash `\\`, this will process the `\` as a literal otherwise it will try to parse \C as a regex expression which it is not.

Also, I forgot the quotes:

| rex field=url "\\Company\\Segment\\Name\\(?<location>\w+)\\\w+.*"

strehb18 · ‎10-19-2020

I think this is close but I think it may be my error. Even with the " " I still get the following error using my actual data. I get myself confused trying to transfer to the data I am actually using. To be clear, I am trying to extract everything between the \ \ where location is.

Actual Input (location_path is the field the data is in.)

| rex field=location_path "\\Oshkosh Corporation\\Defense\\Oshkosh Defense\\(?<location>\w+)\\\w +.*"

Error Message

Error in 'rex' command: Encountered the following error while compiling the regex '\Oshkosh Corporation\Defense\Oshkosh Defense\(?<location>\w+)\\w +.*': Regex: unrecognized character follows \.

chaday00 · ‎10-19-2020

In your 'area' section, it looks like there is a space after the 'w', it should be `\w+.*`

I couldn't find it in the documentation, but you may have to escape the white spaces too. Either way, the last query below, may help. 🤞

| rex field=location_path "\\Oshkosh Corporation\\Defense\\Oshkosh Defense\\(?<location>\w+)\\\w+.*"

| rex field=location_path "\\Oshkosh\ Corporation\\Defense\\Oshkosh\ Defense\\(?<location>\w+)\\\w+.*"

\\\w+.*\s\w+.*\\\w+.*\\\w+.*\s\w+.*\\(?<location>\w+)\\\w+.*

strehb18 · ‎10-19-2020

I get the same error without the space.

chaday00 · ‎10-19-2020

I'm not complete sure how your location is configured so I made some assumptions.

First I created a CSV file to test with and then created a lookup table:

location_number, location_path
1,"\Oshkosh Corporation\Defense\Oshkosh Defense\New York, NY\Staten Island"
2,"\Oshkosh Corporation\Defense\Oshkosh Defense\Atlanta, GA\Bankhead"
3,"\Oshkosh Corporation\Defense\Oshkosh Defense\Atlanta, GA\Peachtree Center"
4,"\Oshkosh Corporation\Defense\Oshkosh Defense\Seattle, WA\Pier"

Then using the query below I was able to get the rex working--turns out I needed 3 backslashes:

| inputlookup test_urls.csv | rex field=location_path "\\\Oshkosh Corporation\\\Defense\\\Oshkosh Defense\\\(?<location>\w+.*\,[\sA-Z]{3})" | table location

Output:

location

New York, NY

Atlanta, GA

Seattle, WA

strehb18 · ‎10-19-2020

Yes! This worked for my data. Thank you for all the help.

Question for extracting a string out of longer string

field extraction

regex

rex

Splunk ITSI & Correlated Network Visibility

Community Content Calendar, August edition

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Are you a member of the Splunk Community?