Splunk Search

Regex to extract fields with different format

nuaraujo
Path Finder

Hello all,

I need your help in order to get a regex that may extract fields from some messages.

Example 1
USER: user1 UPDATED CUSTOMER - 123456. Added new user. New user was added.

What I am looking at this message:
username: user1
operation: UPDATED CUSTOMER (always two words in uppercase)
customer_id: 123456 (always preceded by "-" and ending with ".") (not available in all messages)
comment: Added new user. New user was added

Example2
USER: user2 ADDED COUNTRY with identifier: Germany

What I am looking at this message:
username:user2
operation: ADDED COUNTRY (always two words in uppercase)
comment: with identifier: Germany (in this message I do not have customer_id field)

I am using the following REGEX that I far from being accurate. It works for the first use case but not for the second

 ... | rex field=message    "^USER: (?P<username>.+?) (?P<operation>[A-Z].+?) - (?P<id>.+?)\. (?P<message>.*)"

What I am looking for a final result:
|username.... | operation...........................| id...............| message................................................|
|user1............|UPDATED CUSTOMER.....| 123456 ....| Added new user. New user was added |
|user2............|ADDED COUNTRY............| ..................| with identifier: Germany.........................|

Can someone help me, building a general regex, please?

Tags (2)
0 Karma
1 Solution

javiergn
Super Champion

Hi @nuaraujo,

Try the following regex instead. I've tested it on my lab with you two examples and it seems to be working fine. Note I am assuming your customer ID is a number so you might need to tweak that if that's not the case.

^USER: (?P<username>\S+)\s+(?P<operation>[A-Z]+ [A-Z]+)(\s+\-\s+(?P<customerid>\d+)\.)?\s+(?P<message>.*)

Thanks,
J

View solution in original post

0 Karma

javiergn
Super Champion

Hi @nuaraujo,

Try the following regex instead. I've tested it on my lab with you two examples and it seems to be working fine. Note I am assuming your customer ID is a number so you might need to tweak that if that's not the case.

^USER: (?P<username>\S+)\s+(?P<operation>[A-Z]+ [A-Z]+)(\s+\-\s+(?P<customerid>\d+)\.)?\s+(?P<message>.*)

Thanks,
J

0 Karma

nuaraujo
Path Finder

Thanks
Thanks
Thanks 🙂

0 Karma

lloydknight
Builder

Hello @nuaraujo

try something like this.

... | rex field=message "USER:\s(?<username>.+?)\s(?<operation>\w+\s\w+)\s\-?\s(?<id>\d+)\.?\s(?<message>.*)"

Hope it helps!

0 Karma

p_gurav
Champion

Can you try :

| rex field=message    "^USER: (?P<username>.+?) (?P<operation>\w+) (?P<comment>.*)" | rex field=comment "CUSTOMER - (?P<id>[^\.]+)"
0 Karma

nuaraujo
Path Finder

Thank @p_gurav.

Your suggestion would already be a good solution. However, can you just help me getting 2 words for "operation"? In your suggestion, I am only getting one. Even so, BIG THANK YOU for your quick reply.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi nuaraujo,
try this:

| rex field=message    "^USER: (?P<username>.+?) (?P<operation>[A-Z]+ [A-Z]+) (?<comment>.*)"
| rex field=comment "- (?<id>\d+)\. (?<message>.*)"

In this way you have all fields.

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...