Splunk Search

Regex/sed replaces and issues with succeeding numbers

alekksi
Communicator

Hi all,

I'm having issues with a rex/sed replace not cleanly working. I'm trying to anonymise session IDs in order that, in the few places where it's not yet been updated, it will join with other session IDs in the logs.

Assuming the session key is abc12345678xyz, where any of the characters can be a number or a letter, the current working regex replace I have is:

rex field=session_id mode=sed "s/^([\d\w]{3})[\d\w]{8}/\1.00000000/" | rex field=session_id mode=sed "s/\.//"

Obviously that's not particularly succinct or efficient, but with the alternative, I get the wrong result:

rex field=session_id mode=sed "s/^([\d\w]{3})[\d\w]{8}/\100000000/"

which instead of "abc00000000xyz" I will get "\100000000xyz" as my replacement.

Is there an easier way to do this? Or is there a way to terminate the matching result so I ask for the first matching result rather than the hundred millionth matching result.

Thanks in advance!
Best regards,
Alex

0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi alekksi,
probably you had already seen https://docs.splunk.com/Documentation/Splunk/6.5.2/Data/Anonymizedata
Every way, I'd use something like this

rex field=session_id mode=sed "s/^(.{11})/10000000000/"

obtaining "100000000xyz" from "abc12345678xyz"

Bye.
Giuseppe

View solution in original post

DEAD_BEEF
Builder

To be clear is this what you want? Your post is a bit confusing.

EXISTING
session_id abch573jfuixyz

DESIRED
session_id abc00000000xyz

Does the following meet your criteria?

rex field=session_id mode=sed "s/((?<=^.{3}).{8})/abc00000000xyz/"
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi alekksi,
probably you had already seen https://docs.splunk.com/Documentation/Splunk/6.5.2/Data/Anonymizedata
Every way, I'd use something like this

rex field=session_id mode=sed "s/^(.{11})/10000000000/"

obtaining "100000000xyz" from "abc12345678xyz"

Bye.
Giuseppe

alekksi
Communicator

I have seen that -- thank you for the link. We are using it in some places already. That said, this is anonymised at application level -- it will be fixed in a later version, but I still need to use the data at the moment.

Sorry I wasn't clear enough earlier:

This is the string I start with: "abch573jfuixyz"
This is the string I want: "abc00000000xyz"
This is the regex I am currently using: "s/^([\d\w]{3})[\d\w]{8}/\1.00000000/" | "s/.//"

I realise that I used {12} above-- it is actually that many characters, but using 3 in this example is less hassle

0 Karma

gcusello
SplunkTrust
SplunkTrust

If the leght of your session_id is fixed, you could also use eval command:

| eval session_id=substr(session_id,1,3)+"00000000"+substr(session_id,12)

Bye.
Giuseppe

alekksi
Communicator

Of course that's the obvious solution, should've thought of that. Many thanks!

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...