This must be done on the forwarder, because the parsing will be complete when it leaves the forwarder (the indexer cannot change it).
You need to be sure that field extractions are not done by a Splunk easy button, i.e. sourcetype=iis, or done by field deliminators in props. Field extractions must be done by REPORT-, or possibly EXTRACT-. This is because when field extractions are done by field deliminators in props or by Splunk code they are done at index time and will conflict with the anonymization functions of sedcmd in props and regex in transforms.
FYI, for anyone doing anonymization this is a major problem, because while _raw and the events in the search results show the 'anonymized' value the event Information and Interesting Fields both show the non-anonymized data, ie, the index contains both the anonymized and non-anonymized data and both are easily found by most novice user.
So, for iis logs you cannot use the Splunk sourcetype=iis. This sourcetype invokes the INDEXED_EXTRACTIONS = w3c in system/default/props.conf, which is described very nicely in this Splunk blog post: http://blogs.splunk.com/2013/10/18/iis-logs-and-splunk-6/. The details from the blog post give insight into the problems of indexing iis logs, and the details of the props.conf stanza required to duplicate the automated sourcetype=iis feature.
These are the config files used for testing:
NOTE: The configs for anonymizing sourcetype=iis data are not included because they do not work (see above), but you can modify the stanzas below to see for yourself.
For sourcetype=iis License Volume Testing:
inputs.conf
[monitor://C:\temp\Splunk\test\FilterFields\FFtest.log]
disabled = false
host = Test
index = test
sourcetype = iis
For Custom IIS License Volume Testing:
inputs.conf
[monitor://C:\temp\Splunk\test\FilterFields\LicenseTest5.log]
disabled = false
host = Test
index = test2
sourcetype = iistest2
props.conf
[iistest2]
FIELD_HEADER_REGEX = ^#Fields:\s*(.*)
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = GMT
SEDCMD-dropcsmethod = s/(.*\s\d+\s\w+\s)(\/.*)(\s.*\s\d\d\d\s.*)/\1F\3/g
REPORT-iisFields = REPORT-iisFields2
transforms.conf
[REPORT-iisFields2]
DELIMS = " "
FIELDS = "date","time","c_ip","cs_username","s_ip","s_port","cs_method","cs_uri_stem","cs_uri_querie","cs_status","cs_userAgent"
Large file based on repetition of the event in the iis log example posted in the question with the cs_uri_stem field increased to 120 characters, total size = 12MB.
I checked the licence volume with this search:
index=_internal source=*license_usage.log type="Usage" splunk_server=* earliest=-1w@d | eval Date=strftime(_time, "%Y/%m/%d") | eventstats sum(b) as volume by idx, Date | eval MB=round(volume/1024/1024,5)| timechart first(MB) AS volume by idx
The results were:
Licence Volume with custom sourcetype and anonymization (120 characters replaced with 1 F character) of cs-uri-stem = 5.813 MB
License Volume with sourcetype=iis = 11.683 MB
Regarding the sedcmd, the example included above will isolate the cs_uri_stem field with capture groups and change the value of the field to F. This example will isolate the cs_method field and change the value of the field to F:
SEDCMD-dropcsmethod = s/(.*\s\d+\s+)(\w+)(\s+\/.*)/\1F\3/g
... View more