Getting Data In

strip sensitive data before indexing

zhatsispgx
Path Finder

Hi all,

We run several tools in our environment for network inspection and the logging it provides logs things like NTLM creds, HTTP Basic Auth etc. We'd like to strip this data off before indexing, or at least 'sanitize' it so we dont index usernames/passwords. Can anyone point me in the right direction to solve this?

Example data:

{
"timestamp":"2018-04-04T09:00:08.085563-0600",
"flow_id":151014950299099,
"in_iface":"asdfasdf",
"event_type":"alert",
"vlan":10,
"src_ip":"x.x.x.x",
"src_port":60130,
"dest_ip":"166.70.63.169",
"dest_port":443,
"proto":"TCP",
"tx_id":0,
"alert":{
"action":"allowed",
"gid":1,
"signature_id":2013928,
"rev":4,
"signature":"ET POLICY HTTP traffic on port 443 (PROPFIND)",
"category":"Potentially Bad Traffic",
"severity":2
},
"http":{
"hostname":"www.somesite.org",
"url":"\/things\/remote.php\/webdav\/",
"http_user_agent":"Mozilla\/5.0 (Linux) mirall\/2.3.3",
"http_content_type":"application\/xml",
"http_method":"PROPFIND",
"protocol":"HTTP\/1.1",
"status":207,
"length":382
},
"payload_printable":"PROPFIND \/owncloud\/remote.php\/webdav\/ HTTP\/1.1\r\n
Depth: 0\r\n
Authorization: Basic REDACTEDBASE64PASSWORDHERE==\r\n
User-Agent: Mozilla\/5.0 (Linux) mirall\/2.3.3\r\n
Accept: \/\r\n
Content-Type: text\/xml; charset=utf-8\r\nCookie: oc_sessionPassphrase=redacted \r\nContent-Length: 105\r\n
Connection: Keep-Alive\r\n
Accept-Encoding: gzip, deflate\r\nAccept-Language: en-US,*\r\n
Host: www.somesite.org\r\n\r\n\n\n \n \n <\/d:prop>\n<\/d:propfind>\n",
"stream":1
}

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

richgalloway
SplunkTrust
SplunkTrust

See http://docs.splunk.com/Documentation/Splunk/7.0.3/Data/Anonymizedata

---
If this reply helps you, Karma would be appreciated.
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

Data Management Digest – May 2026

Welcome to the May 2026 edition of Data Management Digest!   As your trusted partner in data innovation, the ...