Splunk Search

Regex to extract fields with different format

nuaraujo
Path Finder

Hello all,

I need your help in order to get a regex that may extract fields from some messages.

Example 1
USER: user1 UPDATED CUSTOMER - 123456. Added new user. New user was added.

What I am looking at this message:
username: user1
operation: UPDATED CUSTOMER (always two words in uppercase)
customer_id: 123456 (always preceded by "-" and ending with ".") (not available in all messages)
comment: Added new user. New user was added

Example2
USER: user2 ADDED COUNTRY with identifier: Germany

What I am looking at this message:
username:user2
operation: ADDED COUNTRY (always two words in uppercase)
comment: with identifier: Germany (in this message I do not have customer_id field)

I am using the following REGEX that I far from being accurate. It works for the first use case but not for the second

 ... | rex field=message    "^USER: (?P<username>.+?) (?P<operation>[A-Z].+?) - (?P<id>.+?)\. (?P<message>.*)"

What I am looking for a final result:
|username.... | operation...........................| id...............| message................................................|
|user1............|UPDATED CUSTOMER.....| 123456 ....| Added new user. New user was added |
|user2............|ADDED COUNTRY............| ..................| with identifier: Germany.........................|

Can someone help me, building a general regex, please?

Tags (2)
0 Karma
1 Solution

javiergn
Super Champion

Hi @nuaraujo,

Try the following regex instead. I've tested it on my lab with you two examples and it seems to be working fine. Note I am assuming your customer ID is a number so you might need to tweak that if that's not the case.

^USER: (?P<username>\S+)\s+(?P<operation>[A-Z]+ [A-Z]+)(\s+\-\s+(?P<customerid>\d+)\.)?\s+(?P<message>.*)

Thanks,
J

View solution in original post

0 Karma

javiergn
Super Champion

Hi @nuaraujo,

Try the following regex instead. I've tested it on my lab with you two examples and it seems to be working fine. Note I am assuming your customer ID is a number so you might need to tweak that if that's not the case.

^USER: (?P<username>\S+)\s+(?P<operation>[A-Z]+ [A-Z]+)(\s+\-\s+(?P<customerid>\d+)\.)?\s+(?P<message>.*)

Thanks,
J

0 Karma

nuaraujo
Path Finder

Thanks
Thanks
Thanks 🙂

0 Karma

lloydknight
Builder

Hello @nuaraujo

try something like this.

... | rex field=message "USER:\s(?<username>.+?)\s(?<operation>\w+\s\w+)\s\-?\s(?<id>\d+)\.?\s(?<message>.*)"

Hope it helps!

0 Karma

p_gurav
Champion

Can you try :

| rex field=message    "^USER: (?P<username>.+?) (?P<operation>\w+) (?P<comment>.*)" | rex field=comment "CUSTOMER - (?P<id>[^\.]+)"
0 Karma

nuaraujo
Path Finder

Thank @p_gurav.

Your suggestion would already be a good solution. However, can you just help me getting 2 words for "operation"? In your suggestion, I am only getting one. Even so, BIG THANK YOU for your quick reply.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi nuaraujo,
try this:

| rex field=message    "^USER: (?P<username>.+?) (?P<operation>[A-Z]+ [A-Z]+) (?<comment>.*)"
| rex field=comment "- (?<id>\d+)\. (?<message>.*)"

In this way you have all fields.

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

The Defining Technology Movement of Our Lifetime The advent of agentic AI is arguably the defining technology ...

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...