Splunk Search

Regex to extract fields with different format

nuaraujo
Path Finder

Hello all,

I need your help in order to get a regex that may extract fields from some messages.

Example 1
USER: user1 UPDATED CUSTOMER - 123456. Added new user. New user was added.

What I am looking at this message:
username: user1
operation: UPDATED CUSTOMER (always two words in uppercase)
customer_id: 123456 (always preceded by "-" and ending with ".") (not available in all messages)
comment: Added new user. New user was added

Example2
USER: user2 ADDED COUNTRY with identifier: Germany

What I am looking at this message:
username:user2
operation: ADDED COUNTRY (always two words in uppercase)
comment: with identifier: Germany (in this message I do not have customer_id field)

I am using the following REGEX that I far from being accurate. It works for the first use case but not for the second

 ... | rex field=message    "^USER: (?P<username>.+?) (?P<operation>[A-Z].+?) - (?P<id>.+?)\. (?P<message>.*)"

What I am looking for a final result:
|username.... | operation...........................| id...............| message................................................|
|user1............|UPDATED CUSTOMER.....| 123456 ....| Added new user. New user was added |
|user2............|ADDED COUNTRY............| ..................| with identifier: Germany.........................|

Can someone help me, building a general regex, please?

Tags (2)
0 Karma
1 Solution

javiergn
Super Champion

Hi @nuaraujo,

Try the following regex instead. I've tested it on my lab with you two examples and it seems to be working fine. Note I am assuming your customer ID is a number so you might need to tweak that if that's not the case.

^USER: (?P<username>\S+)\s+(?P<operation>[A-Z]+ [A-Z]+)(\s+\-\s+(?P<customerid>\d+)\.)?\s+(?P<message>.*)

Thanks,
J

View solution in original post

0 Karma

javiergn
Super Champion

Hi @nuaraujo,

Try the following regex instead. I've tested it on my lab with you two examples and it seems to be working fine. Note I am assuming your customer ID is a number so you might need to tweak that if that's not the case.

^USER: (?P<username>\S+)\s+(?P<operation>[A-Z]+ [A-Z]+)(\s+\-\s+(?P<customerid>\d+)\.)?\s+(?P<message>.*)

Thanks,
J

0 Karma

nuaraujo
Path Finder

Thanks
Thanks
Thanks 🙂

0 Karma

lloydknight
Builder

Hello @nuaraujo

try something like this.

... | rex field=message "USER:\s(?<username>.+?)\s(?<operation>\w+\s\w+)\s\-?\s(?<id>\d+)\.?\s(?<message>.*)"

Hope it helps!

0 Karma

p_gurav
Champion

Can you try :

| rex field=message    "^USER: (?P<username>.+?) (?P<operation>\w+) (?P<comment>.*)" | rex field=comment "CUSTOMER - (?P<id>[^\.]+)"
0 Karma

nuaraujo
Path Finder

Thank @p_gurav.

Your suggestion would already be a good solution. However, can you just help me getting 2 words for "operation"? In your suggestion, I am only getting one. Even so, BIG THANK YOU for your quick reply.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi nuaraujo,
try this:

| rex field=message    "^USER: (?P<username>.+?) (?P<operation>[A-Z]+ [A-Z]+) (?<comment>.*)"
| rex field=comment "- (?<id>\d+)\. (?<message>.*)"

In this way you have all fields.

Bye.
Giuseppe

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...