Splunk Search

Help with regex to extract words before column"

snallam123
Path Finder

Events:
com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:

| rex field=_raw "^(?:[^ \n] ){6}(?P[^ ]+)" and "^(.\w?):"

I tried above but it's not correct.

I need to extract these: ServerAuditDetailAssertion, Applications paymentRedirects Permission Application assertion to any new field.

Can someone help me with this?

0 Karma
1 Solution

triest
Communicator

I'm not completely convinced this is what you're asking for, but if I am reading your question correctly does this work for you?

| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

For example the following search:

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

produced

Search results showing _raw with the text from above, a _time field, and foo as a multi-value field containing ServerAuditDetailsAssertion, Application, paymentRedirects, Permission, Application, and assertion

NOTE: In the above example, foo is a multivalue fields. You could use an eval to join them with a delimiter.

For those who don't know regex, the .(?[^.]+):\s+\d+: regex basically says:

As a general strategy, the unique thing on each line that gives us the values we want is it starts with a . and ends with a : So we're going to use that uniqueness to match just what we want.

  1. look for a literal . (remember . normally matches any character so we have to escape)
  2. (?... ) is a capture group. That is the text is matches goes into the field called foo
  3. [^.] Basically any character except a literal . [ ... ] is a character class. The ^ negates it and the class itself is matching a literal .
  4. The + means one or more times, so we're looking for one or more non-. characters. This is what will match the actually values.
  5. We then end it with the : which is what prevents earlier parts of the string from matching (e.g. texh)

The key here is I am assuming Applications, paymentRedirects etc could by anything and that you want all of them. If you only wanted these values you could easily change the regex to look for hard coded values instead of the generic not a .

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>(?:ServerAuditDetailAssertion|Applications|paymentRedirects|Permission|Application|assertion)):\s+\d+:"

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi
try this regex

(?ms).*ServerAuditDetailAssertion:\s+(?<ServerAuditDetailAssertion>[^:]*):\s*.*Applications:\s+(?<Applications>[^:]*):\s*.*paymentRedirects:\s+(?<paymentRedirects>[^:]*):\s*.*Permission:\s+(?<Permission>[^:]*):\s*.*Application:\s+(?<Application>[^:]*):\s*.*assertion:\s+(?<assertion>[^:]*)

that you can test at https://regex101.com/r/6Xa7NE/1

So you'll have, e.g. a stat for each Application:

index=my_index
| rex "(?ms).*ServerAuditDetailAssertion:\s+(?<ServerAuditDetailAssertion>[^:]*):\s*.*Applications:\s+(?<Applications>[^:]*):\s*.*paymentRedirects:\s+(?<paymentRedirects>[^:]*):\s*.*Permission:\s+(?<Permission>[^:]*):\s*.*Application:\s+(?<Application>[^:]*):\s*.*assertion:\s+(?<assertion>[^:]*)"
| stats  values (ServerAuditDetailAssertion) AS ServerAuditDetailAssertion values(paymentRedirects) AS paymentRedirects values(Permission) AS Permission values (Applications) AS Applications values (assertion) AS assertion BY Application

Obviously you can use also other functions as sum, avg, etc... instead values, but I don't know your need.

Bye.
Giuseppe

0 Karma

triest
Communicator

I'm not completely convinced this is what you're asking for, but if I am reading your question correctly does this work for you?

| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

For example the following search:

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

produced

Search results showing _raw with the text from above, a _time field, and foo as a multi-value field containing ServerAuditDetailsAssertion, Application, paymentRedirects, Permission, Application, and assertion

NOTE: In the above example, foo is a multivalue fields. You could use an eval to join them with a delimiter.

For those who don't know regex, the .(?[^.]+):\s+\d+: regex basically says:

As a general strategy, the unique thing on each line that gives us the values we want is it starts with a . and ends with a : So we're going to use that uniqueness to match just what we want.

  1. look for a literal . (remember . normally matches any character so we have to escape)
  2. (?... ) is a capture group. That is the text is matches goes into the field called foo
  3. [^.] Basically any character except a literal . [ ... ] is a character class. The ^ negates it and the class itself is matching a literal .
  4. The + means one or more times, so we're looking for one or more non-. characters. This is what will match the actually values.
  5. We then end it with the : which is what prevents earlier parts of the string from matching (e.g. texh)

The key here is I am assuming Applications, paymentRedirects etc could by anything and that you want all of them. If you only wanted these values you could easily change the regex to look for hard coded values instead of the generic not a .

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>(?:ServerAuditDetailAssertion|Applications|paymentRedirects|Permission|Application|assertion)):\s+\d+:"
Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...