OK. Several things here. 1. For a question starting with "what is the best way", especially if no boundary conditions are given, the answer is usually "it depends". 2. From my experience - the wors...
See more...
OK. Several things here. 1. For a question starting with "what is the best way", especially if no boundary conditions are given, the answer is usually "it depends". 2. From my experience - the worse problem definition - the less reliable outcome. I've dealt with customers who wanted something just "configured so it works" (we're not necessarily talking about Splunk, just a general idea) and the result was usually less than stellar. Your problem is rooted in the compliance but it's also equally common in DLP areas - just find something. We don't know what/where/if it is but we want you to find it. While for some types of identifiers you can distinguish them because they are in a particular format _and_ they have some internal integrity which you can check (like IBAN numbers has control digits), others do not have it and there is either a fat chance of false positives or false negatives, depending on how creative you are with finding - for example - all those possible ways of writing a phone number. And don't even get me started on trying to find names or addresses. Of course, you can try to "use AI" to guess what ad where constitutes sensitive data but this will only add another layer to already excruciating headache. Even a human, having a relatively good understanding of a context, could make mistakes here now and then. So even without getting into the gory technical details of how to implement such stuff with/around Splunk, I'd say if you want to do something like that without proper data classification and well-defined things to filter/mask you're in for a treat - a neverending project of tweaking your detection engine and dealing with stakeholders' complaints about false positives and negatives.