Splunk Search

Help with regex to extract words before column"

snallam123
Path Finder

Events:
com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:

| rex field=_raw "^(?:[^ \n] ){6}(?P[^ ]+)" and "^(.\w?):"

I tried above but it's not correct.

I need to extract these: ServerAuditDetailAssertion, Applications paymentRedirects Permission Application assertion to any new field.

Can someone help me with this?

0 Karma
1 Solution

triest
Communicator

I'm not completely convinced this is what you're asking for, but if I am reading your question correctly does this work for you?

| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

For example the following search:

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

produced

Search results showing _raw with the text from above, a _time field, and foo as a multi-value field containing ServerAuditDetailsAssertion, Application, paymentRedirects, Permission, Application, and assertion

NOTE: In the above example, foo is a multivalue fields. You could use an eval to join them with a delimiter.

For those who don't know regex, the .(?[^.]+):\s+\d+: regex basically says:

As a general strategy, the unique thing on each line that gives us the values we want is it starts with a . and ends with a : So we're going to use that uniqueness to match just what we want.

  1. look for a literal . (remember . normally matches any character so we have to escape)
  2. (?... ) is a capture group. That is the text is matches goes into the field called foo
  3. [^.] Basically any character except a literal . [ ... ] is a character class. The ^ negates it and the class itself is matching a literal .
  4. The + means one or more times, so we're looking for one or more non-. characters. This is what will match the actually values.
  5. We then end it with the : which is what prevents earlier parts of the string from matching (e.g. texh)

The key here is I am assuming Applications, paymentRedirects etc could by anything and that you want all of them. If you only wanted these values you could easily change the regex to look for hard coded values instead of the generic not a .

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>(?:ServerAuditDetailAssertion|Applications|paymentRedirects|Permission|Application|assertion)):\s+\d+:"

View solution in original post

gcusello
Esteemed Legend

Hi
try this regex

(?ms).*ServerAuditDetailAssertion:\s+(?<ServerAuditDetailAssertion>[^:]*):\s*.*Applications:\s+(?<Applications>[^:]*):\s*.*paymentRedirects:\s+(?<paymentRedirects>[^:]*):\s*.*Permission:\s+(?<Permission>[^:]*):\s*.*Application:\s+(?<Application>[^:]*):\s*.*assertion:\s+(?<assertion>[^:]*)

that you can test at https://regex101.com/r/6Xa7NE/1

So you'll have, e.g. a stat for each Application:

index=my_index
| rex "(?ms).*ServerAuditDetailAssertion:\s+(?<ServerAuditDetailAssertion>[^:]*):\s*.*Applications:\s+(?<Applications>[^:]*):\s*.*paymentRedirects:\s+(?<paymentRedirects>[^:]*):\s*.*Permission:\s+(?<Permission>[^:]*):\s*.*Application:\s+(?<Application>[^:]*):\s*.*assertion:\s+(?<assertion>[^:]*)"
| stats  values (ServerAuditDetailAssertion) AS ServerAuditDetailAssertion values(paymentRedirects) AS paymentRedirects values(Permission) AS Permission values (Applications) AS Applications values (assertion) AS assertion BY Application

Obviously you can use also other functions as sum, avg, etc... instead values, but I don't know your need.

Bye.
Giuseppe

0 Karma

triest
Communicator

I'm not completely convinced this is what you're asking for, but if I am reading your question correctly does this work for you?

| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

For example the following search:

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

produced

Search results showing _raw with the text from above, a _time field, and foo as a multi-value field containing ServerAuditDetailsAssertion, Application, paymentRedirects, Permission, Application, and assertion

NOTE: In the above example, foo is a multivalue fields. You could use an eval to join them with a delimiter.

For those who don't know regex, the .(?[^.]+):\s+\d+: regex basically says:

As a general strategy, the unique thing on each line that gives us the values we want is it starts with a . and ends with a : So we're going to use that uniqueness to match just what we want.

  1. look for a literal . (remember . normally matches any character so we have to escape)
  2. (?... ) is a capture group. That is the text is matches goes into the field called foo
  3. [^.] Basically any character except a literal . [ ... ] is a character class. The ^ negates it and the class itself is matching a literal .
  4. The + means one or more times, so we're looking for one or more non-. characters. This is what will match the actually values.
  5. We then end it with the : which is what prevents earlier parts of the string from matching (e.g. texh)

The key here is I am assuming Applications, paymentRedirects etc could by anything and that you want all of them. If you only wanted these values you could easily change the regex to look for hard coded values instead of the generic not a .

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>(?:ServerAuditDetailAssertion|Applications|paymentRedirects|Permission|Application|assertion)):\s+\d+:"
Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...