We have Splunk system collecting data from various sources (network, OS, application logs etc).
Unfortunately, some of these systems send PAN related data with unmasked credit card details, but we dont know where.
Is there a way to tackle these? We need to track they are sending PAN related data, but don't want to store that data (or store in hashed format).
My only thought is
- create an index pci_secure_index with permission only to restricted users
- Index all data normally. But run scheduled search to detect PAN information. Collect these data and summary index to "pci_secure_index"
- Delete (delete) from the original index
Is there a better approach?
(PS: We tried the anonymise data approach to search for cc pattern in first 5000 characters, but the system almost went down to knees)
If you don't need real time, you could pre parse data with a script, and after index them in Splunk.
We did this for a customer that wanted to encrypt one field without lost it.
Bye.
Giuseppe
If that's the case, deploy more indexers. I don't see any other ways.
see this
http://docs.splunk.com/Documentation/Splunk/6.4.1/Data/Anonymizedata
Bye.
Giuseppe
We tried the anonymise data approach to search for cc pattern in first 5000 characters, but the system almost went down to knees. The above link is good, if we are 100% sure or field where the PAN is coming. But incoming terrabytes of data with whole event scan is performance killer.