Splunk Search

Regex for complex search string

arunsubram
Explorer

Search String
- Promotion Created, Coupon Settings For PromoCode=121509PromoId=3550966 : 17429150|Gillette|111082|9999999|Save $5.00 on Gillette|Save $5.00 on ONE Gillette Fusion ProShield|2016-05-29T07:00:00Z|2016-07-02T07:00:00Z|2016-07-02T07:00:00Z||811000474001215093500110100|RMS|[047400656048, 047400656055, 047400656062, 047400656079, 047400656109, 047400656116]|[]||RetailerBanners : [Brookshire]

Need to create a table as below . Column 3 as bold starts after ":" and should be seperated with Column names as 1,2..

Table sample:
PromoCode PromoId Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8 Column 9 Column 10
121509 3550966 17429150 Gillette 111082 9999999 Save $5.00 on Gillette Save $5.00 on ONE Gillette Fusion ProShield 2016-04-29T07:00:00Z 2016-05-02T07:00:00Z 2016-07-02T07:00:00Z

Tags (2)
0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

Try (not complete, add as many as required...)

| rex field=a_field "^[^\:]+\:(?<field1>[^\|]+)\|(?<field2>[^\|]+)\|(?<field3>[^\|]+)\|"

Some explanation to help you extend it and understand it.

^[^\:]+\: says to start at the beginning (first ^ ) and read one or more + characters not matching a colon ( [^\:] ).
Then, (?<field1> create an extraction named "field1" which reads one or more characters that are not a pipe symbol [^\|]+ then close the extraction piece ) .
Now, between fields there will be a pipe symbol, find that. \| then start the next extraction group (?<field2>[^\|]+) and repeat.

You'll want to add them one at a time (or a couple when more confident), in groups like (?<field1>[^\|]+)\| except the very last one won't have a closing pipe symbol, so you'll end it with (?<fieldN>[^\|]+) . Notice no ending \| .

Le me know if that gets it for you.

View solution in original post

0 Karma

javiergn
Super Champion

Try the following (you can ignore the top three lines as they are needed to generate demo data):

Approach one

| stats count
| fields - count
| eval _raw = "
Promotion Created, Coupon Settings For PromoCode=121509PromoId=3550966 : 17429150|Gillette|111082|9999999|Save $5.00 on Gillette|Save $5.00 on ONE Gillette Fusion ProShield|2016-05-29T07:00:00Z|2016-07-02T07:00:00Z|2016-07-02T07:00:00Z||811000474001215093500110100|RMS|[047400656048, 047400656055, 047400656062, 047400656079, 047400656109, 047400656116]|[]||RetailerBanners : [Brookshire]
"
| rex field=_raw "PromoCode=(?<PromoCode>\d+)PromoId=(?<PromoId>\d+)\s+:\s+(?<Column1>\d+)\|(?<Column2>[^\|]+)\|(?<Column3>[^\|]+)\|(?<Column4>[^\|]+)\|(?<Column5>[^\|]+)\|(?<Column6>[^\|]+)\|(?<Column7>[^\|]+)\|(?<Column8>[^\|]+)\|(?<Column9>[^\|]+)"

Output (see picture 1):

alt text

Explanation: https://regex101.com/r/sR3pL0/1

Approach 2

You could use split to store all your columns in a multivalue field and access the ones you need very easily with mvindex.

| stats count
| fields - count
| eval _raw = "
Promotion Created, Coupon Settings For PromoCode=121509PromoId=3550966 : 17429150|Gillette|111082|9999999|Save $5.00 on Gillette|Save $5.00 on ONE Gillette Fusion ProShield|2016-05-29T07:00:00Z|2016-07-02T07:00:00Z|2016-07-02T07:00:00Z||811000474001215093500110100|RMS|[047400656048, 047400656055, 047400656062, 047400656079, 047400656109, 047400656116]|[]||RetailerBanners : [Brookshire]
"
| rex field=_raw "PromoCode=(?<PromoCode>\d+)PromoId=(?<PromoId>\d+)\s+:\s+(?<Columns>.+?)\|\|"
| eval Columns = split(Columns, "|")
| eval Column1 = mvindex(Columns, 0)
| eval Column2 = mvindex(Columns, 1)
......

Output:

alt text

Hope that helps.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Try (not complete, add as many as required...)

| rex field=a_field "^[^\:]+\:(?<field1>[^\|]+)\|(?<field2>[^\|]+)\|(?<field3>[^\|]+)\|"

Some explanation to help you extend it and understand it.

^[^\:]+\: says to start at the beginning (first ^ ) and read one or more + characters not matching a colon ( [^\:] ).
Then, (?<field1> create an extraction named "field1" which reads one or more characters that are not a pipe symbol [^\|]+ then close the extraction piece ) .
Now, between fields there will be a pipe symbol, find that. \| then start the next extraction group (?<field2>[^\|]+) and repeat.

You'll want to add them one at a time (or a couple when more confident), in groups like (?<field1>[^\|]+)\| except the very last one won't have a closing pipe symbol, so you'll end it with (?<fieldN>[^\|]+) . Notice no ending \| .

Le me know if that gets it for you.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Oops, I noticed you have two pipes together. So I changed all the + (one or more) symbols in the capture groups to * (zero or more), like this:

...| rex field=a_field "^[^\:]+\:(?<field1>[^\|]*)\|(?<field2>[^\|]*)\|(?<field3>[^\|]*)\|(?<field4>[^\|]*)\|(?<field5>[^\|]*)\|(?<field6>[^\|]*)\|(?<field7>[^\|]*)\|(?<field8>[^\|]*)\|(?<field9>[^\|]*)\|(?<field10>[^\|]*)\|(?<field11>[^\|]*)\|(?<field12>[^\|]*)\|(?<field13>[^\|]*)"

You'll have to use your fieldname in the place of my "a_field" or just leave that entire little piece off so it uses _raw. Anyway, that's up to field 13 which is itself a composite field. The same technique could be used on it too, like

... | rex field=field13 "(?<code1>\d+)[^\d]+(?<code2>\d+)[^\d]+(?<code3>\d+)[^\d]+(?<code4>\d+)[^\d]+(?<code5>\d+)[^\d]+(?<code6>\d+)[^\d]+"

That one looks for repeated "digits, not digits" (i.e. spaces and commas) patterns INSIDE field13, and names them code1, code2...

0 Karma

arunsubram
Explorer

Thanks rich. this was really helpful

0 Karma

Richfez
SplunkTrust
SplunkTrust

Here's a link to the first portion (not the field13 stuff, but before) in regex101.com

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...