Getting Data In

Removing all white spaces from event at Index time

Tim_1
Path Finder

Hi all,

I want to remove the whitespaces from only the account value, and not the whole event at index time. Is this possible?

Given the events look like this:

{"account": "Account", "justification": "TEST 1", "value": "50"}

{"account": "dev 1", "justification": "TEST 2", "value": "50"}

{"account": "uat test acc", "justification": "TEST 3", "value": "50"}

{"account": "a .. x .. y .. z .. etc", "justification": "TEST 4", "value": "50"}

I want it to look like this:

{"account": "Account", "justification": "TEST 1", "value": "50"}

{"account": "dev1", "justification": "TEST 2", "value": "50"}

{"account": "uattestacc", "justification": "TEST 3", "value": "50"}

{"account": "axyzetc", "justification": "TEST 4", "value": "50"}
0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

The following is assuming that you really have data that looks like 1 .. n in your data stream, rather than something like 1 2 3 4 5 6 7 8 9 0. If you have only things like the latter, then it will be a simpler regex, but this one will work either way.

You could probably do something like this in props.conf:

SEDCMD-pass1 = s/Account ([^"\s]+)(\s([^"\s]+))?(\s([^"\s]+))?(\s([^"\s]+))?(\s([^"\s]+))?/\1\2\4\6\8/

This will remove up to 4 spaces. If you need to do more, then add a second pass, or third pass:

SEDCMD-pass2 = s/Account ([^"\s]+)(\s([^"\s]+))?(\s([^"\s]+))?(\s([^"\s]+))?(\s([^"\s]+))?/\1\2\4\6\8/

I haven't completely tested this, but I believe it to be fairly correct. If your event data differs much from this example, then it could make things more difficult.

View solution in original post

lfedak_splunk
Splunk Employee
Splunk Employee

Hey @Tim_1 if they solved your problem, please don't forget to accept an answer! You can upvote posts as well. (Karma points will be awarded for either action.) Happy Splunking!

0 Karma

Tim_1
Path Finder

Hi @Ifedak, will do so once I found a solution. Thanks 🙂

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

I assume that you mean you want to eliminate all spaces, or all white space, from the account field at index time?

Try something like this in transforms.conf

[stanzaname]
SOURCE_KEY = account
REGEX = ^([^\s]+)(\s+)*([^\s]*)(\s+)*([^\s]*)(\s+)*([^\s]*)(\s+)*([^\s]*)(\s+)*(.*)$
DEST_KEY = account
FORMAT = $1$3$5$7$9$11

You can repeat this phrase ([^\s]*)(\s+)* once for each number of spaces you want to eliminate, and add one more odd number to the FORMAT. Not sure how many is the highest possible number.

0 Karma

Tim_1
Path Finder

Hi @DalJeanis,

Thanks for the answer. Is there a way to do it without having to change it depending on the number of spaces? I would prefer not to have to create multiple stanza for each different number of n spaces.

Also, my question wasn't 100% clear on the data I want to reformat. I've updated the question to be more inline of what the data should be.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

The following is assuming that you really have data that looks like 1 .. n in your data stream, rather than something like 1 2 3 4 5 6 7 8 9 0. If you have only things like the latter, then it will be a simpler regex, but this one will work either way.

You could probably do something like this in props.conf:

SEDCMD-pass1 = s/Account ([^"\s]+)(\s([^"\s]+))?(\s([^"\s]+))?(\s([^"\s]+))?(\s([^"\s]+))?/\1\2\4\6\8/

This will remove up to 4 spaces. If you need to do more, then add a second pass, or third pass:

SEDCMD-pass2 = s/Account ([^"\s]+)(\s([^"\s]+))?(\s([^"\s]+))?(\s([^"\s]+))?(\s([^"\s]+))?/\1\2\4\6\8/

I haven't completely tested this, but I believe it to be fairly correct. If your event data differs much from this example, then it could make things more difficult.

View solution in original post

Tim_1
Path Finder

Hi @cpetterborg,

Thanks for the answer. My question wasn't 100% clear with the examples, so I've updated the question to be more inline of what the data should be.

The data won't be integers, but strings.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

This should still work with strings of multiple characters.

Tim_1
Path Finder

Yes, got it half working so far.
Thanks for the help. 🙂
Will accept when fully complete.

sbbadri
Motivator

try this,

| makeresults | eval test="{\"account\": \"Account 1 2\", \"justification\": \"TEST 1\", \"value\": \"50\"}" | rex field=test "(?P<t1>{\"account\":\s+)(?P<t2>\"Account\s+\S+.*\")(?P<t3>\,\s+\"justification\":\s+\"TEST\s+\d+\"\,\s+\"value\":\s+\"\d+\"})" | rex field=t2 mode=sed "s/ //g" | eval t4=t1+t2 | eval t5=t4+t3 | rename t5 as test

0 Karma

Tim_1
Path Finder

Hi @sbbadri,

Thanks for the answer, but I am looking at doing this at index time and not at search time.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!