I'm tasked to provide apache logs to a third party for their analysis, but the IPs must be replaced to hide the browsers' identity.
Sounds like a simple splunk job: select, piped through scrub, then exported.
But the scrub function scrubs more than just the IPs, it also scrubs the URLs.
I'm hunting for alternatives that will only scrub IPs.
Alternatively, I'm considering hacking scrub to only affect IPs.
Any thoughts?
Thanks.
After lots of reading and too many attempts. Renaming the fields is the best option, IMO. Example below is where src is the IP address. This is undocumented.
| rename * AS _*
| rename _src AS src
| scrub
| rename _* AS *
(It would be nice if scrub took a field listing as an option. It appears you can do this through config files, but getting that done on splunkcloud would be $#%^py. Please upvote the idea.)
You can use a rename within your search to temporarily hide what you don't want scrubbed, then rename it again after the scrub but before the results are presented. The example below is something we've come up with to scrub a firewall IPS log. The search looks for the device (FG is a FortiGate) and message type (ips). The "ref" is a reference to a real URL from the vendor website. We rename the ref to _ref which gets ignored by scrub, then rename _ref to ref and build the report table.
device_id="FG*" type="ips" | stats count by msg devname ref | rename ref as _ref | scrub | sort 10 -count | rename _ref as ref| table msg devname ref
do you do it on the ip field? | scrub ipfieldname
One way of getting a result although it is not very elegant:
$SPLUNK_HOME/bin/splunk anonymize file -source /tmp/events.txt -private-terms /opt/splunk/etc/anonymizer/ip-list.txt -public-terms /tmp/events.txt
The public-terms wont get replaced (which is everything since the file with the events is used) the private-terms do get replaced and they seem to have a higher priority
I got the Idea from here:
http://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/AnonymizedatasamplestosendtoSuppo...
Not very nice, not very efficient I know, but it worked ...
Better choice may be to use rex
in sed mode. This is a rough stab at a sed-expression to do some (not optimal) scrubbing.
... | rex mode=sed "s/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\.([0-9]{1,3})/xxx.yyy.zzz.\2/g"