Hi all,
I need to strip cookie values from IIS events. The sourcetype is correctly set as "iis" and the following config will stip the cookie value from the example log below as expected (the event shows with 'XXXXX's when searched for in splunk), but the cs_cookie field from the event is still showing up with the original cookie value at search time:
[note: The answers.splunk.com forum has stripped some of the forward-slashes from the regex below]
[iis]
SEDCMD-maskcookie = s/[^\s\;-+]+$/XXXXXXXXXX/g
### Example logs #Software: Microsoft Internet Information Services 7.5 #Version: 1.0 #Date: 2014-09-18 01:47:12 #Fields: date time s-ip cs-method cs-url cs-args cs-port cs-user cs-ip x cs-status x x x cs-cookie 2014-09-18 01:27:23 10.11.0.148 GET /example/loadEvents.aspx start=1409493600&end=1413122400&_=1411003642320 443 username 10.0.0.216 Mozilla/5.0+(X11;+Linux+x86_64;+rv:32.0)+Gecko/20100101+Firefox/32.0 200 0 0 1468 2014-09-18 01:27:23 10.11.0.148 POST /example/TimesheetServices.svc - 4443 - 10.13.0.14 - 200 0 0 0 DCjoic2m3kASOk2092cmDSKO02
I believe that since Splunk 6.1 the default props.conf uses INDEXED_EXTRACTIONS = w3c and my hunch is that the field is being extracted from the log, at index time, before the SEDCMD is running.
I'm at a bit of a loss as to what I can do (other than creating a new sourcetype) to resolve this issue. Has anyone else successfully stripped sensitive information from events with sourcetype=iis?
Note: I would attach an image of what shows up at search time but apparently I don't have enough karma points.
I found the best solution to be: Create a new sourcetype for the IIS logs needing masking.
Eg. in inputs.conf on the host:
[monitor://C:\inetpub\logs\LogFiles\W3SVC*\u_ex*]
sourcetype=iis:restricted
disabled = false
Then on the same host or in the same 'app' on the deployment server add a props.conf with the following:
[iis:restricted]
TZ = GMT
SEDCMD-AnonymizeSessionID = s/[sS][eE][sS][sS][iI][oO][nN][iI][dD]=[\w\-]{25}/sessionId=XXXXXXXX/g
INDEXED_EXTRACTIONS = W3C
This way obfuscating and 'automatic' extraction that Splunk 6 brings to the party for IIS logs still works!
I tested this, there is no need to change sourcetype. Just configure your sedcmd at the universal forwarder and it works just fine.
note this only works for the pretrained indexed extraction type such as w3c. For any other sourcetype, you do need to use a heavy forwarder.
I am having the same issue now with Bluecoat logs and the splunk addon using indexed extraction type w3c. Unfortunately I cannot apply the proposed solution on the forwarder as I am receiving the logs directly on the indexer. Are there any other options to make the SEDCMD work BEFORE the file gets indexed ?
I found the best solution to be: Create a new sourcetype for the IIS logs needing masking.
Eg. in inputs.conf on the host:
[monitor://C:\inetpub\logs\LogFiles\W3SVC*\u_ex*]
sourcetype=iis:restricted
disabled = false
Then on the same host or in the same 'app' on the deployment server add a props.conf with the following:
[iis:restricted]
TZ = GMT
SEDCMD-AnonymizeSessionID = s/[sS][eE][sS][sS][iI][oO][nN][iI][dD]=[\w\-]{25}/sessionId=XXXXXXXX/g
INDEXED_EXTRACTIONS = W3C
This way obfuscating and 'automatic' extraction that Splunk 6 brings to the party for IIS logs still works!
If you manipulate event data in conjunctuion with INDEXED_EXTRACTIONS, be careful not to change the event length. Indexed extraction field locations are calculated before the transforms run.
Whats the repercussions of this?
I too am having the same issue.
One more piece of information... if I use the same logs, but change the sourcetype to something like "iisweb" the the obfuscation works perfectly. So I'm sure it's something to do with the default props.conf for the "iis" sourcetype and not my regex.
I did also try using the transforms.conf method of obfuscating data like this:
TRANSFORMS-anonymize-raw = session-anonymizer-raw
TRANSFORMS-anonymize-field = session-anonymizer-field
### transforms.conf [session-anonymizer-raw] REGEX = (.*)[\w\d]+$ FORMAT = $1XXXXXXXXXX DEST_KEY = _raw [session-anonymizer-field] REGEX = (.*)[\w\d]+$ FORMAT = cs_cookie::"XXXXXXXXXX" WRITE_META = true
[Note: Looks like answers.splunk.com is messing with some of the characters in the regex above. Hopefully you get the idea]
But alas, while it will obfuscate the raw log, instead of overwriting the cs_cookie field, it simply adds to it. So I see both the 'XXXXXX's and the original cookie value at search time.