Splunk Search

Help with regex to extract words before column"

snallam123
Path Finder

Events:
com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:

| rex field=_raw "^(?:[^ \n] ){6}(?P[^ ]+)" and "^(.\w?):"

I tried above but it's not correct.

I need to extract these: ServerAuditDetailAssertion, Applications paymentRedirects Permission Application assertion to any new field.

Can someone help me with this?

0 Karma
1 Solution

triest
Communicator

I'm not completely convinced this is what you're asking for, but if I am reading your question correctly does this work for you?

| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

For example the following search:

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

produced

Search results showing _raw with the text from above, a _time field, and foo as a multi-value field containing ServerAuditDetailsAssertion, Application, paymentRedirects, Permission, Application, and assertion

NOTE: In the above example, foo is a multivalue fields. You could use an eval to join them with a delimiter.

For those who don't know regex, the .(?[^.]+):\s+\d+: regex basically says:

As a general strategy, the unique thing on each line that gives us the values we want is it starts with a . and ends with a : So we're going to use that uniqueness to match just what we want.

  1. look for a literal . (remember . normally matches any character so we have to escape)
  2. (?... ) is a capture group. That is the text is matches goes into the field called foo
  3. [^.] Basically any character except a literal . [ ... ] is a character class. The ^ negates it and the class itself is matching a literal .
  4. The + means one or more times, so we're looking for one or more non-. characters. This is what will match the actually values.
  5. We then end it with the : which is what prevents earlier parts of the string from matching (e.g. texh)

The key here is I am assuming Applications, paymentRedirects etc could by anything and that you want all of them. If you only wanted these values you could easily change the regex to look for hard coded values instead of the generic not a .

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>(?:ServerAuditDetailAssertion|Applications|paymentRedirects|Permission|Application|assertion)):\s+\d+:"

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi
try this regex

(?ms).*ServerAuditDetailAssertion:\s+(?<ServerAuditDetailAssertion>[^:]*):\s*.*Applications:\s+(?<Applications>[^:]*):\s*.*paymentRedirects:\s+(?<paymentRedirects>[^:]*):\s*.*Permission:\s+(?<Permission>[^:]*):\s*.*Application:\s+(?<Application>[^:]*):\s*.*assertion:\s+(?<assertion>[^:]*)

that you can test at https://regex101.com/r/6Xa7NE/1

So you'll have, e.g. a stat for each Application:

index=my_index
| rex "(?ms).*ServerAuditDetailAssertion:\s+(?<ServerAuditDetailAssertion>[^:]*):\s*.*Applications:\s+(?<Applications>[^:]*):\s*.*paymentRedirects:\s+(?<paymentRedirects>[^:]*):\s*.*Permission:\s+(?<Permission>[^:]*):\s*.*Application:\s+(?<Application>[^:]*):\s*.*assertion:\s+(?<assertion>[^:]*)"
| stats  values (ServerAuditDetailAssertion) AS ServerAuditDetailAssertion values(paymentRedirects) AS paymentRedirects values(Permission) AS Permission values (Applications) AS Applications values (assertion) AS assertion BY Application

Obviously you can use also other functions as sum, avg, etc... instead values, but I don't know your need.

Bye.
Giuseppe

0 Karma

triest
Communicator

I'm not completely convinced this is what you're asking for, but if I am reading your question correctly does this work for you?

| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

For example the following search:

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

produced

Search results showing _raw with the text from above, a _time field, and foo as a multi-value field containing ServerAuditDetailsAssertion, Application, paymentRedirects, Permission, Application, and assertion

NOTE: In the above example, foo is a multivalue fields. You could use an eval to join them with a delimiter.

For those who don't know regex, the .(?[^.]+):\s+\d+: regex basically says:

As a general strategy, the unique thing on each line that gives us the values we want is it starts with a . and ends with a : So we're going to use that uniqueness to match just what we want.

  1. look for a literal . (remember . normally matches any character so we have to escape)
  2. (?... ) is a capture group. That is the text is matches goes into the field called foo
  3. [^.] Basically any character except a literal . [ ... ] is a character class. The ^ negates it and the class itself is matching a literal .
  4. The + means one or more times, so we're looking for one or more non-. characters. This is what will match the actually values.
  5. We then end it with the : which is what prevents earlier parts of the string from matching (e.g. texh)

The key here is I am assuming Applications, paymentRedirects etc could by anything and that you want all of them. If you only wanted these values you could easily change the regex to look for hard coded values instead of the generic not a .

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>(?:ServerAuditDetailAssertion|Applications|paymentRedirects|Permission|Application|assertion)):\s+\d+:"
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...