Regular Expression extract beginning and end of st...

rhenry · ‎03-03-2022

Hello,

I have a situation where I am trying to pull from within a field the nomenclature of ABC-1234-56-7890 but want to be able to only pull the first three letters and the last four numbers into one field. I have the following query below thus far but have not figured out how to do as described above:

| rex field=comment (?<ABC>ABC\-\d+\-\d+\-\d+)

I want the return of "ABC-7890"

What am I missing so that I can successfully pull both beginning and end of the above described string? Thanks!

yuanliu · ‎03-03-2022

I can't help but noticing that your initial regex contains hard-coded leading string "ABC". This implies that the first group of letters is fixed. If this is the case, you can focus on the end of string, then compose with the known group, like this:

| rex field=comment "\bABC-\S+-(?<ABC>\d+)"
| eval ABC="ABC-" . ABC

Another way is to use sed mode to strip whatever you don't need. This example assumes that leading string is unknown.

| rex field=comment mode=sed "s/.*?(\w+)\S+-(\d+).*/\1-\2/"

(If you cannot sacrifice original content of comment, you can first copy it into a different field name such as ABC, then apply rex to that field.)

Alternatively, you can apply sed or replace to the ABC field you initially extracted. This example uses replace.

| rex field=comment (?<ABC>ABC\-\d+\-\d+\-\d+)
| eval ABC=replace(ABC, "ABC-\d+-\d+-", "ABC-")

PickleRick · ‎03-03-2022

Unfortunately, with PCRE you don't have a "ignore this part" group. (I would also welcome that)

You can however capture the beginning and end into separate fields and then create a calculated field combining them together,

gcusello · ‎03-03-2022

Hi @rhenry,

you could use a regex and an eval:

your_search
| rex "^(?<my_field_1>\w\w\w).*(?<myfield_2>\d\d\d\d)"
| eval my_field=my-field-1."-".my_field_2

you can test the regex at https://regex101.com/r/S7tXqS/1

Ciao.

Giuseppe

rhenry · ‎03-03-2022

Hey this string does what I am looking for. However, it looks like it only works if ABC-1234-56-7890 is the only string in the field. What if there is additional words before and after? Like for example:

"This the location for ABC-1234-56-7890 at this point."

Is there a way to extract just that string highlighted above and again only beginning and end? Thanks!

gcusello · ‎03-03-2022

Hi, please try this:

your_search
| rex "(?<my_field_1>\w\w\w)\S*(?<myfield_2>\d\d\d\d)"
| eval my_field=my-field-1."-".my_field_2

that you can test at https://regex101.com/r/S7tXqS/2

Ciao.

Giuseppe

Regular Expression extract beginning and end of string- What am I missing?

fields

lookup

regex

subsearch

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?