Splunk Search

Help with transforms regex needed

damucka
Builder

Hello,

I need help with the proper hashing of the user IDs and IP addresses using the transforms.conf
I have the following configuration in my props.conf and transforms.conf:

props.conf

[(?::){0}*hanatraces]
TRUNCATE = 1000
MAX_EVENTS = 50
TRANSFORMS-BWP_parameterChangelog_clone
TRANSFORMS-eliminatedebug = debugsetnull
TRANSFORMS-LogReplayCoordinator = LogReplaysetnull
TRANSFORMS-anon = anonymize-ip, anonymize-user

transforms.conf:

#****************************** Mask the D/C/I-user names and the IP-Addresses
[anonymize-user]
REGEX = ([=,>'\\":;|\s])([ICDicd]\d{3,})([,<:;|'&\\"\s])
FORMAT = $1(D/C/I)######$3
DEST_KEY = _raw
REPEAT_MATCH = true

[anonymize-ip]
REGEX = ([=,\s])(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})([,:\s])
FORMAT = $1##.##.##.##$3
DEST_KEY = _raw
REPEAT_MATCH = true
#******************************

Now, the issue is that the data is hashed but what happened is that I am getting additional event only with the hashed string and the rest of the original event gets truncated. Example of the proper unhashed event (user BWP_BWP_PRIVATE_SCHEDULER not matching the user ID hashing REGEX):

[270388]{451119}[1093/28122613728] 2019-05-22 10:05:44.496402 i TraceContext     TraceContext.cpp(01111) : UserName=BWP_BWP_PRIVATE_SCHEDULER, ApplicationUserName=BW_CORE, ApplicationName=ABAP:BWP, ApplicationSource=CL_SQL_STATEMENT==============CP:304, Client=001, StatementHash=4dfd52017afa6b8cdb8274ff8b619ca9, EppRootContextId=901B0E97D4881ED99CA30CD8D54D3BDC, EppTransactionId=4883981CD2FE04A0E005CE5008B855CB, EppConnectionId=00000000000000000000000000000000, EppConnectionCounter=0, EppComponentName=BWP/ls5903_BWP_21, EppAction=ZPT_HANA_PROC_001, StatementExecutionID=844442116702720

And now example of the outcome where the transforms hashing has been applied:

Event 1:
5/22/19
10:03:51.774 AM 
[288421]{-1}[-1/-1] 2019-05-22 10:03:51.774361 i TraceContext     TraceContext.cpp(01111) : UserName=

Event 2:
5/22/19
10:03:51.774 AM 
 (D/C/I)######

So, first the splitting is something I do not want and then the original event gets truncated after the UserName, there should be also details concerning the executed statement and so on.
The original REGEX when I was using the SEDCMD in the props.conf directly was:

[BWP_hanatraces]
SEDCMD-UserNameMask = s/([=,>'\\":;|\s])([ICDicd]\d{3,})([,<:;|'&\\"\s])/\1(D\/C\/I)######\3/g
SEDCMD-IPAddressMask = s/([=,\s])(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})([,:\s])/ \1##.##.##.##\3/g

and it worked fine.
Could you please advice?

Kind Regards,
Kamil

0 Karma
1 Solution

FrankVl
Ultra Champion

Couple of things here:

1: not sure if the multiple results is caused by your transforms (I don't see how). I see it more likely this is a result of your funky linebreaking/truncation tricks. That max_events might very well be having some strange effect here (especially since you don't seem to be properly specifying how events need to be broken).
2: I don't think you do a replace on _raw like this with multiple matches. Not entirely sure what the effect would exactly be, but it doesn't make much sense to me.
3: your regex doesn't make sense either. You're only matching a single character in the first capture group and a single character in the last capture group and an i,c or d followed by 3 or more digits in the middle capture group. And then you want to replace your entire _raw message by $1(D/C/I)######$3. Which effectively means you destroy most of your raw event.

Can you please explain conceptually (with some examples) what you want to achieve exactly? Then we can see what would be the correct approach.

And why not stick with the SEDCMD? Because that solution you mention does seem to make sense and that is probably how I would do it.

View solution in original post

0 Karma

FrankVl
Ultra Champion

Couple of things here:

1: not sure if the multiple results is caused by your transforms (I don't see how). I see it more likely this is a result of your funky linebreaking/truncation tricks. That max_events might very well be having some strange effect here (especially since you don't seem to be properly specifying how events need to be broken).
2: I don't think you do a replace on _raw like this with multiple matches. Not entirely sure what the effect would exactly be, but it doesn't make much sense to me.
3: your regex doesn't make sense either. You're only matching a single character in the first capture group and a single character in the last capture group and an i,c or d followed by 3 or more digits in the middle capture group. And then you want to replace your entire _raw message by $1(D/C/I)######$3. Which effectively means you destroy most of your raw event.

Can you please explain conceptually (with some examples) what you want to achieve exactly? Then we can see what would be the correct approach.

And why not stick with the SEDCMD? Because that solution you mention does seem to make sense and that is probably how I would do it.

0 Karma

damucka
Builder

Hello @FrankVl

Thank you.
I switched back to the SEDCMD and the hashing looks good now. The props.conf looks as follows:

[(?::){0}*hanatraces]
TRUNCATE = 1000
MAX_EVENTS = 50
TRANSFORMS-BWP_parameterChangelog_clone
TRANSFORMS-eliminatedebug = debugsetnull
TRANSFORMS-LogReplayCoordinator = LogReplaysetnull

TRANSFORMS-anon = anonymize-ip, anonymize-user

SEDCMD-UserNameMask = s/([=,>'\":;|\s])([ICDicd]\d{3,})([,<:;|'&\"\s])/\1(D\/C\/I)######\3/g
SEDCMD-IPAddressMask = s/([=,\s])(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})([,:\s])/ \1##.##.##.##\3/g

What I would like to achieve with truncating / max_events is the lowering the license :-). Quite often the events have hundreds of lines and also the lines themselves are long. Such events are barely readable, so I thought I would limit the line size to 1000 characters and have 50 lines per event at most. The rest should be skipped.
That was the idea. But unfortunately what I can see is that the line gets truncated to 1000 chars, but the event does not get truncated after 50 lines, but splitted.
Is there any way to limit the event to 50 lines, each max 1000 chars and throw the rest of the event to trash without splitting it and creating other events out of it?

Regards,
Kamil

0 Karma

FrankVl
Ultra Champion

Yeah, I saw your other question on that. See the SEDCMD solution I posted there, give that a try instead of the MAX_EVENTS approach.

I converted my comment to an answer, please accept it if it helped. Let's continue the truncation discussion in your earlier post 🙂

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...