Getting Data In

How can I mask data - both at index time and search time?

tadreeves
Engager

I found out that in one of my web logs that Splunk's been eating, there's data that I need to mask out. So, I've got two problems to solve:

(a) Removing the sensitive data (though not the WHOLE event) from already-indexed data, and

(b) Making it so newly-indexed data has this same data masked.

What's my best way to approach this? It's data like this that I'm trying to mask:

173.103.16.2 - - [10/Jun/2011:16:09:27 -0500] "GET /admin/load-scripts.jsp?c=1&failedPassword=FAILEDPASSWORDIWANTTOMASK&otheroptions=3

marcoscala
Builder

This is a simple trick to mask data at search time. Get the part of the event to mask with a "rex" command, then modify the "_raw" field with the masked data.
From original event, trim the last 5 digit from accountNumber. Original event:

2016-04-06 12:24:06,Event [Event=UpdateBillingProvQuote, timestamp=1337891259, properties={JMSCorrelationID=NA, JMSMessageID=ID:ESP-PD.F4CB3B4B9EF87:AA49A1BD, orderType=FeatureChange, quotePriority=NORMAL, conversationId=ESB~16214F4A71D1DA77:E35B0544:0F2958EEF3F0:B580, credits=NA, JMSReplyTo=pub.esb.genericasync.response, timeToLive=-1, serviceName=UpdateBillingProvisioning, esn=7F758AD4A3B86F, accountNumber=900013479, MethodName=InternalEvent, AdapterName=UpdateBillingProvQuote, meid=NA, orderNumber=19256698, quoteNumber=75909847, ReplyTo=NA, userName=temordia, EventConversationID=NA, mdn=5789374447, accountType=PrePaid, marketCity="ARVADA", marketState=CO, marketZip=80006, billingCycle=27, autoBillPayment=T, phoneCode=HE4G, phoneType=Android, phoneName="HTC Evo 4G", planCode=ULPRE50, planType=PrePaid, planPrice=50.00, planName="Unlimited Prepaid", planDescription="Nationwide Prepaid Unlimited Minutes", networkProviderName=Splunktel}]

New search:

index=oidemo sourcetype=business_event | rex "^(?<head>.*accountNumber=\d+)\d{5},(?<tail>.*)$" | eval _raw=head."XXXX".tail

The new event now looks like this:

2016-04-06 12:24:06,Event [Event=UpdateBillingProvQuote, timestamp=1337891259, properties={JMSCorrelationID=NA, JMSMessageID=ID:ESP-PD.F4CB3B4B9EF87:AA49A1BD, orderType=FeatureChange, quotePriority=NORMAL, conversationId=ESB~16214F4A71D1DA77:E35B0544:0F2958EEF3F0:B580, credits=NA, JMSReplyTo=pub.esb.genericasync.response, timeToLive=-1, serviceName=UpdateBillingProvisioning, esn=7F758AD4A3B86F, accountNumber=9000XXXX MethodName=InternalEvent, AdapterName=UpdateBillingProvQuote, meid=NA, orderNumber=19256698, quoteNumber=75909847, ReplyTo=NA, userName=temordia, EventConversationID=NA, mdn=5789374447, accountType=PrePaid, marketCity="ARVADA", marketState=CO, marketZip=80006, billingCycle=27, autoBillPayment=T, phoneCode=HE4G, phoneType=Android, phoneName="HTC Evo 4G", planCode=ULPRE50, planType=PrePaid, planPrice=50.00, planName="Unlimited Prepaid", planDescription="Nationwide Prepaid Unlimited Minutes", networkProviderName=Splunktel}]

rajbir1
Explorer

This works perfectly!!

0 Karma

lguinn2
Legend

I downvoted this post because because it only works if you write every search for the users

rajbir1
Explorer

You are right, there is no way to mask the data at search time. This solution is only for the purposes of hiding the data for specific dashboard panel/report.

0 Karma

marcoscala
Builder

Lisa, I do agree with your comments but it happened also to us to have users requiring this visibility to the original raw data limited to only certain roles. So the solution of "masking" at search time with "rex", together with disabling drilldown it was the solution we adopted.

Do you know of any other search time solution?

Regards,

Marco

0 Karma

lguinn2
Legend

There is no way to mask the data for only a subset of users at search time - unless you are going to write every search for that subset of users, and restrict those users accessing the search bar in any way.

One alternative could be to route only the sensitive data to a special index. Most of the data then could go to indexes that are widely visible, and that users can search. The sensitive data then would go in a special index that only some roles could access. For others to access the special index, they could be required to use dashboards, etc. that limit/mask their access. You would still need to be careful with those dashboards, etc. to make use that techniques like drill-down would not compromise the security of the data.

0 Karma

lguinn2
Legend

This solution does not meet my definition of "masking"

This hides the data for just this search alone.

So this solution will work only in a dashboard and only if you have also disabled drill-down and disabled "open in search." A user who drills down - or who uses the magnifying class to "open in search" - will be able to circumvent the masking.

Thus my earlier answer.

0 Karma

MicroAlpha
Explorer

If you’re willing to use a third-party tool (Eclipse GUI) for masking, you can mask it next time (before you re-index it) with this one:

http://www.iri.com/blog/data-protection/secure-then-splunk-a-format-preserving-encryption-and-pseduo...

0 Karma

lguinn2
Legend

You can mask sensitive data at index time. (Ask more questions if that's not sufficient information!)

However, once the data has been indexed, there is no way to change it. Not possible.

All you can do it delete the data and re-index it. You can't mask it at search time.

I know that isn't the answer that you wanted... sorry!

rajbir1
Explorer

I downvoted this post because there is a way to mask sensitive data at search time now as well. please see the last answer below

0 Karma

lguinn2
Legend

I disagree with your down-vote. See my comment below.

Being able to hide the data in a single search does not mask it. For the "trick" to work, users cannot be allowed to access the search bar.

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...