Getting Data In

Force index-time host extraction to lower-case

Motivator

Is there a way to extract the hostname from an event, but force it to lower-case in the process?

Extracting the hostname is easy enough (DEST_KEY in transforms.conf, etc.), but this doesn't account for the case.

The SEDCMD option in props.conf would appear to be an option, but it's not clear whether 'y/[A-Z]/[a-z]/' style replacements are supported. Even if they are, using SEDCMD would modify the original event text, which is undesirable.

The goal is normalize hostnames so that they are consistent for all events from that machine, without modifying the actual event text.

Tags (1)
1 Solution

Splunk Employee
Splunk Employee

I don't believe this is possible. There is certainly a case to be made for allowing simple transforms (e.g., simple string operations like yours, or basic arithmetic) that can not be accomplished by PCRE, but that would have to be an enhancement to the product, and has some other repercussions on searching for such transformed fields.

I suppose in your particular case, for search purposes it's not necessary (as search is case-insenstive), and for reporting and display you can still use the eval lower() function. It does mess up metadata a bit, but you could resolve that by, e.g., changing the metadata search on the dashboards from

| metadata type=hosts

to

| metadata type=hosts 
| eval host=lower(host) 
| stats 
    sum(totalCount) as totalCount
    min(firstTime) as firstTime
    max(lastTime) as lastTime
    max(recentTime) as recentTime
    first(type) as type
  by host

(though this might actually get recentTime wrong, but I doubt that's a problem in practice)

If you're looking at a few specific hosts and specific ways they are capitalized, you could also construct a lookup table and set a combination of automatic FIELDALIAS and LOOKUP to overwrite the original host field. You could do it with a scripted lookup too I guess, if it's more complicated than that. This seems a little wrong to me though.

View solution in original post

Splunk Employee
Splunk Employee

I don't believe this is possible. There is certainly a case to be made for allowing simple transforms (e.g., simple string operations like yours, or basic arithmetic) that can not be accomplished by PCRE, but that would have to be an enhancement to the product, and has some other repercussions on searching for such transformed fields.

I suppose in your particular case, for search purposes it's not necessary (as search is case-insenstive), and for reporting and display you can still use the eval lower() function. It does mess up metadata a bit, but you could resolve that by, e.g., changing the metadata search on the dashboards from

| metadata type=hosts

to

| metadata type=hosts 
| eval host=lower(host) 
| stats 
    sum(totalCount) as totalCount
    min(firstTime) as firstTime
    max(lastTime) as lastTime
    max(recentTime) as recentTime
    first(type) as type
  by host

(though this might actually get recentTime wrong, but I doubt that's a problem in practice)

If you're looking at a few specific hosts and specific ways they are capitalized, you could also construct a lookup table and set a combination of automatic FIELDALIAS and LOOKUP to overwrite the original host field. You could do it with a scripted lookup too I guess, if it's more complicated than that. This seems a little wrong to me though.

View solution in original post

Splunk Employee
Splunk Employee

gkanapathy is correct here. Although SEDCMD can perform y///g substitutions, it's only on _raw and not on any other fields.