Getting Data In

How to anonymize the client ip in ms:iis log files?

manuelostertag
Path Finder

Hello,

I has to anonymize the client ip in ms:iis log files at indexing time, so it must not be possible to determine the original client ip address.

It's not so easy, because it's not allowed to replace the last segment of the client ip address with xxx, because evaluations (session duration etc.) take place based on the client ip address. So that's no conclusions about the address is possible, the address should be hashed (md5, Sha256, etc.)

Example of the log file before the anonymization:

2019-09-08 00:17:21 127.0.0.1 GET /media/our_company/luanda.jpg - 443 - 111.22.33.44 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/75.0.3770.142+Safari/537.36 https://www.google.com/ 200 0 0 1109

Example of the log file after the anonymization:

2019-09-08 00:17:21 127.0.0.1 GET /media/our_company/luanda.jpg - 443 - 5F3999E8BDD23B Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/75.0.3770.142+Safari/537.36 https://www.google.com/ 200 0 0 1109

I try to anonymize the data on the UF, which is installed on the MS Windows IIS server, and on the Indexer (which runs under rhel 7) I did not succeed.

Steps I have tried on the UF and Indexer
On the UF I tried to anonymize the ip with the ingest_eval command in the tranfsforms.conf, but with this, the _raw data will not be modified:

[hash_client-ip]
INGEST_EVAL = c_ip_hash=substr(md5(c_ip),1,15)

In the next step I try to use unarchive_cmd in the props.conf in the source stanza and try to call a powershell or a python script, but nothing happend:

[source:://C:\Splunk\Test_iis_data\u_ex*.log]
invalid_cause = archive
unarchive_cmd = $SPLUNK_HOME/bin/StringMD5_Hash.py
NO_BINARY_CHECK = true
priority = 10002
sourcetype = test_spezial

I also try it with the SEDCMD in the propf.conf, but I don't know how to calculate the md5 inside the command.

Now my question:
How can I anonymize the data (with hashing) before indexing?

0 Karma

Sukisen1981
Champion

if you do have a c_ip field, have you tried something like this?
INGEST_EVAL = c_ip=substr(md5(c_ip),1,15)

and then in the props.conf to set the stanza, and probably in the fields.conf as well for searching
https://docs.splunk.com/Documentation/Splunk/7.3.1/Data/IngestEval

0 Karma

manuelostertag
Path Finder

Hello,

like described above, I have tried the INGEST_EVAL command. I forgot to mention, that I also create the entry in the fields.conf:

[c_ip_hash]
INDEXED=true
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...