Splunk Search
Highlighted

How to normalize user values that are logged in two different formats at search-time?

Explorer

For reasons I can't explain, our SiteMinder-protected web site is logging user in two different formats, one that just has the simple user name, and another that has the domain prefix; so for a given user, we have web server access logs that contain both "myname" and "MyDom//MyName".

My goal is to "normalize" these so that when I perform stats against user, I get both of these variants aggregated into one count for the one user.

Might seem like a job for a simple RegEx, BUT... notice that the cases are different! "myname" gets logged in lower case, but "MyDom//MyName" gets logged in mixed case. Through RegEx and use of Upper(), I've been able to get the two variants to display the same in a report... but they are still getting reported distinctly with separate counts. I tried to dedup based on the "normalized" value, but then it only returned one of the two variants, with only the counts for that variant (not both of them aggregated.)

Any ideas?

0 Karma
Highlighted

Re: How to normalize user values that are logged in two different formats at search-time?

Path Finder

Have you considered field aliases?

http://docs.splunk.com/Documentation/Splunk/6.2.1/Knowledge/Addaliasestofields
Also here:
http://answers.splunk.com/answers/110019/using-field-aliases.html

With this, you could create a new custom "combined_user" that you could run reports on with the same counts.

0 Karma
Highlighted

Re: How to normalize user values that are logged in two different formats at search-time?

Explorer

Thanks, gwalford. But as a newbie, I couldn't figure out how this would help; seems like just giving it a new name but I was still tripped up about modifying the value.

0 Karma
Highlighted

Re: How to normalize user values that are logged in two different formats at search-time?

SplunkTrust
SplunkTrust

What you need is a calculated field which will normalize the all variations, format and case, for user field.
See this :- http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/definecalcfields

e.g.

Props.conf on Search Head (assuming already existing field name is 'User')

[YourSourceType]
EVAL-User = mvindex(split(upper(User),"/"),-1)

View solution in original post

Highlighted

Re: How to normalize user values that are logged in two different formats at search-time?

Explorer

Actually, this worked...
| eval user_clean=mvindex(split(upper(user),"//"),-1)
And then I can do stats based on "user_clean" and don't even need to dedup.

I just made it part of the search, so that I don't need to change any .conf files. (I don't have the necessary permissions.)
THANKS!

Highlighted

Re: How to normalize user values that are logged in two different formats at search-time?

Splunk Employee
Splunk Employee

Actually, you don't need to edit config files to do this. In the Splunk UI, under Settings->Fields you will see an entry for "Calculated fields". If you create the same definition that you put into your search there, assign it to your sourcetype in question and share it globally, the field will be automatically calculated whenever you search for that sourcetype.

0 Karma
Highlighted

Re: How to normalize user values that are logged in two different formats at search-time?

Explorer

Thanks, somesoni2. Due to the federated roles our large org has regarding Splunk, I am essentially limited to creating searches and stuff within my own app, and can't tweak any of the .conf files. (I do understand why that would be a more strategic method, and maybe sometime I'll navigate the Change Process to make it happen!) But I was able to apply your basic solution within a search, so that's great. Thanks!

0 Karma