Splunk Search

Search query to replace first occurrence word with blank but second occurrence to replace with comma

Kitteh
Path Finder

How do I use regex or replace to remove the first occurrence word found and replace second occurrence onward with comma?

For example, the raw data is:
ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root

I want it to be:
CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0),CRON[2907]: pam_unix(cron:session): session closed for user root

0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

If you have only one second occurrence of the beginning string, this will work:

| makeresults 
| eval _raw="ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root by (uid=0)" 
| rex mode=sed "s/^(\S+)(.*?)\s(\1)/\2, /"

The process for multiple occurrences is more complex. Is the data in that case similar to the example that you provided? if not can you provide an example? Is there a maximum number of occurrences?

View solution in original post

inventsekar
SplunkTrust
SplunkTrust

You can run rex two times, first time to replace the first ubuntu with blank,
second ubuntu with a comma

(if the string "ubuntu" is not known before hand, please update some more details(which spot it appears), so that rex can be updated)
(rex mode=sed can not be tested on regex101 website, i have tested it on splunk directly, it works fine.. please check the screenshot)

|makeresults
 | eval _raw = "ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root"
 | rex mode=sed field=_raw "s#(^ubuntu\s)##"
 | rex mode=sed field=_raw "s#ubuntu#,#"
 | table _raw

alt text

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If you have only one second occurrence of the beginning string, this will work:

| makeresults 
| eval _raw="ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root by (uid=0)" 
| rex mode=sed "s/^(\S+)(.*?)\s(\1)/\2, /"

The process for multiple occurrences is more complex. Is the data in that case similar to the example that you provided? if not can you provide an example? Is there a maximum number of occurrences?

inventsekar
SplunkTrust
SplunkTrust

Hi @cpetterborg, great rex command... Great learning !

to other rex beginners, let me explain it -
"s/^(\S+)(.?)\s(\1)/\2, /"
^(\S+) --- captures the first word
`(.
?)------ remaining line is captured as "\2", till the 2nd ubuntu match
\s(\1)---- matching for "a space and word ubuntu"
before the "/", only matching part, after this "/", its the replacement part
\2,--- on the replacement, leave the\1`, write the "\2" match and then a comma ",". thats it.

cpetterborg
SplunkTrust
SplunkTrust

Thank you. I saw your original post in email. I'm glad you figured it all out. Congratulations! 🙂 I've upvoted your comment for the fine explanation!

Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...