Getting Data In

Dealing with key/value pairs with inconsistent key case

Communicator

Our set up is like this..

On all of our servers, we collect a "contacts" file we use to keep track of what function each server performs. These are all collected and stored in a central location.

File looks like this:

sysadmin: firstname.lastname
backupadmin: firstname.lastname
application: Apache Webserver
function: production

Fairly easy to parse. I have this set up:

inputs.conf

[monitor:///opt/servers/.../contacts]
disabled = false
followTail = 0
host_segment = 3
index = firewalker
sourcetype = etc-contacts

props.conf

[contacts]
REPORT-contactfields = transform_contacts

transforms.conf

[transform_contacts]
DELIMS = "\n", ":"
CLEAN_KEYS = true

However, the problem is.. over many years of following this procedure, some people decided they liked uppercase for the keys, some people decided they liked lowercase for the keys. The scripts we used previously would convert it all to lower case, so it was a non-issue.

However, Splunk treats the keys with case-sensitivity. Files like this cause issues:

SYSADMIN: firstname.lastname
BACKUPADMIN: firstname.lastname
application: Apache Webserver
FUNCTION: production

And we end up with both "SYSADMIN" and "sysadmin" as fields in Splunk search.

Is there a way to convert "SYSADMIN" into "sysadmin" automatically? I'd like to avoid doing it on a field-by-field basis, since there are about ~50 other fields we use, and they can come and go at any time.

1 Solution

Champion

I haven't done this yet, but you should be able to use SED feature within your props.conf.

http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles


[source::///opt/servers/.../contacts]
SEDCMD-contacts = s/\([A-Za-z0-9]*\)/\1\L\2\g

View solution in original post

Champion

I haven't done this yet, but you should be able to use SED feature within your props.conf.

http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles


[source::///opt/servers/.../contacts]
SEDCMD-contacts = s/\([A-Za-z0-9]*\)/\1\L\2\g

View solution in original post

Communicator

@bmacias84 Oh, I see.
I'm extracting my keys dynamically from my file, so I can't decide the case.
Do you know any ways to make the keys lowercase without changing my files? I don't have any problem with re-indexing my files.

0 Karma

Champion

@gelica, My example only applies to _raw field at time of index. I recommend using lowercase as much as possible. If the data is already indexed use lookup tables or field aliasing to normalize your data.

0 Karma

Communicator

Should this change the key to lowercase? I tried adding it to my props.conf, but with no luck.
Would this be applied at index time or at search time? I restarted Splunk and re-indexed my files as well but it didn't work for me 😕

0 Karma

Motivator

part of the issue has to do with where you add the props.conf It needs to be where the input parsing is happening. The other part has to do with the SEDCMD code which requires 3x "/" (4 sections total) I think that the author meant:

SEDCMD-contacts = s/([A-Za-z0-9]*)/\1\L\2/g

One author notes that it only works in gnu sed and not POSIX sed. My bet is that Splunk uses gnu.

Another author clarifies:

The upper and lowercase letters are handled by :
\U Makes all text to the right uppercase.
\u makes only the first character to the right uppercase.
\L Makes all text to the right lowercase.
\l Makes only the first character to the right lower case. (Note its a lowercase letter L)

In that case, I would craft it:

SEDCMD-contacts = s/([A-Za-z]*)/\L\1/g
0 Karma