Knowledge Management

Is there way to resize/hide large data fields and Index?

mala_splunk_91
Explorer

Hello

Please some one help on the below queries

  1. I have J SON logs, It has "image" and "sound" fields which has huge value (nearly 100kb) and I want to neglect only those fields and index the rest of events.
    Is there any way to do so?

  2. Will masking work on large data ? if yes,Is that possible to reduce data size and index, by masking large size of data with dummy values?

3.Basically, I need to remove those fields from a event and Index. Please share me some ideas on it.

I tried hiding fields using props and transforms. But it did not work.

Sample log:

"captcha" : {
"image" : "xxxxxxxxxxx\r\nxxxxxxxxx\r\n(50kb values)",
"sound" : "xxxxxxxxxxxxxxx(50kb values)",
"answer" : "xxxxxx",
},

Thanks,
Mala

Tags (2)
0 Karma

kmorris_splunk
Splunk Employee
Splunk Employee

You should be able to achieve this by using data anonymizing techniques that you would use for masking of data. But instead of adding the masking, you could just exclude that portion of the event. You can use either regex or sed to anonymize data. With sed, you could do a substitution and substitute that section with nothing.

https://docs.splunk.com/Documentation/Splunk/6.5.3/Data/Anonymizedata

mala_splunk_91
Explorer

Hi,

I tried the below one

Props.conf

[sourcetype::image]
TRANSFORMS-set = setnull

Transforms.conf

[setnull]
REGEX = (?m)^(.)"image" : "(.)",(.*)$
FORMAT = $1"image" :"xxx",$3
DST_KEY = _raw

Thanks
Mala

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Your REGEX pattern doesn't match your sample data. That would prevent Splunk from reducing the data.
This regex string works in regex101.com: ("image" : ")(.*?)(",\n)("sound" : ")(.*?)(",). It's not necessary for the regex string to match the entire event - only the part you wish to change needs to match.

---
If this reply helps you, Karma would be appreciated.
0 Karma

mala_splunk_91
Explorer

I just tried the below transforms.conf for hiding "image values" . But it is not working for me

[setnull]
REGEX = ("image" : ")(.*?)(",\n)
FORMAT = $1xxx$2
DST_KEY = _raw

I applied same on SED in props.conf, Not working . Is any fault on below?
SEDCMD-set = s/("image" : ")(.*?)(",\n)/xxx\2/g

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Try FORMAT= $1xxx$3 or SEDCMD-set = s/("image" : ")(.*?)(",\n)/\1xxx\3/g

---
If this reply helps you, Karma would be appreciated.
0 Karma

mala_splunk_91
Explorer

Below is the sample data:

"captcha" : {
"image" : "iVBORw0KGgoAAAANSUhEUgAAAJYAAAAoCAIAAACTo5SwAAAeIUlEQVR42u2cd1hTV/z/60Ac1Var\r\/\r\noZAEBJ9v///Xze5/M55+Srr/7v+v/gAtm/gOzxIHsC4P8G+L/W4vschX\r\ndIo7k+bDjgvic5GacsiPL2st7tUCU3sUBbIPHRBq1NU0hRC6II7QBBQuluUvbss2FXNMBHGL+KgR\r\nN9yQE2aE20JO2CJOOLTFnHBTbiSEt0KQsFacDuMn5LcLlO3BEe7XPMorK7dqzjlcFL8nnbqdg2xP\r\np+4qen2wOe+ErEKJ8BJECKqvyiqvtJVeFudeFHAgSGsudpRDs+LQjnFoxznYCWjc6NP8uPOC9Ovi\r\ngvttFQ4fq582F9sXpdxKZ1xRbRmFdk21ZfQmJ+YWN/YOP/GhIPO5uMS9rZb4URjYLCAUcZ0hRQ7z\r\nESf2MSf2CSf2KSfWkcN6xmE957CcuPGu/BQPQW6gWBDZKqRLmliyN/GSpvj3QmZ1ESUnLYibSOS8\r\n9uW8hq9+nAR/TkIgbsFcdiifSxEUssQ1nLbmnC7qaj0Rwn80EXYVRQ3aoyh0QSVCUGgqL1gqzV/W\r\nmr20ibtElLJEmLJUmLJcmLJCmLJSmGomTF0tTF0rSjNv4m1ozd0sLdouL4X8cBf8hPCg2lFeueDY\r\nhxJI0VKUcViYcVjEO9qca/2h7Iy88oIK4RWIEFRfk1ddl1Zeby291pR/RcS/JMy+LMy+Isy+JuRf\r\nF/JviHJuNhXebRXYS2ueyOsc5cJnH6ocm4sdRHn2ql2jT/Bdo46fdo0WOIsKXJpK3VurfaT1/nJx\r\nkFwc/EEY1CzwExX5CAt9hUUEYRFRWOSvsOJAYXGQsDhEVEpqqoxoFdKkTUz52zjw7rXiNPb7JNnb\r\nxLaG103VDJEgWlgOLUZYzhQKYoWCOKEgXih4LapMbKpLbW3Mkrbky9uKvmCBqT2KAvlHTYSd6mr/\r\nRtFPCBVRFOcHbQkoXAqKlssLV8oKVkjzzSQFqyQFayQF5gorXC8p3CAp3CQt2iIr3iov2Q5Kd4Cy\r\ndhdsF0KLjkd5QcVxucBaVm4tLTshKTspLT8tE5yVV55XCmFHhPhp+pvy6luyqtvSqtuSyjuSyruS\r\nyvuSKjtJ1QNptb2s1kFe9wQInylM5CQXOsnqXKR1rpJaaC8Vu0br3JW7RiV1XhKht1REkNX7yRv8\r\nQWMQaAwGjaHyxlCZmCxtIEvqyZKGcElDhKQhSiKmSMSIREyViCG5aFkzQ/6GCRT84vHT9AqE4H2y\r\n/F2y7G2K9G2q5A20NMkbruRthuQtT/I2U/IuW/qeL2vJU2yW6eXGbU0hVEPYlRAadBbCTy6II+zw\r\nhQiK09hrPp3GVn0hAijZDEq3gtJtoHQ7KNvZGaG6ECoRKr8QAVSeApWnQaUil8DtkxCqEF5XfiEC\r\nqLkNamxB7V3llkOA71cDdY9AnQMQQn5PcX7PIUIgcgEiV1D/EtS/AvVuQLHf0FO55RA0+AIxEYj9\r\ngDhAYQqE2r8QATSj4A0VvMHAGzp4EwPeMsBbyI8F2l2w4xci9OIob68XmNqF8F+E/3f9P339D+Dp\r\nIgw5+CH1AAAAAElFTkSuQmCC\r\n",
"sound" : "UklGRhQ2AgBXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YfA1AgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAD+/wAAAgAAAAIAAQAAAAAAAgAABAAEACQAAAAAABAAAAAIAAgABAAAAAwAIAAUACAAIAAEABQD+/wUABADy/wAA+//4//P/8v/o/+n/7P/X/+b/4P/h/+P/3f/m/+v/5P/j/+j/3f/i/+T/3P/g/+j/4f/q/+T/4f/o/+b/7v/k/+j/8P/l/+n/6f/w/+//8f/x/+//9f/z//X/9f/2//f/9/8AAAQA/f/6//7/9f8AAAAA+v/+//3//P/9/////P///wAA/P////b/AAAAAPz////9//7//v8AAP3///8AAP7/AAD3/wAAAAD9/wAA/f////f/+/8AAP3////9//3//f////7/9v8BAP7//P8AAPz//v///wAA/f8AAP///v8AAP7///8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA+P8CAAAA/v8AAPb/+//5//n/+P////3/+v////r//f/9//3//v/+//j/+P/6//b/AAD8//v/BAABAAIAAwD+/wcAEAAMABYADwARABUAFQAXABcAGwARAA8AGAALAA4AFwARABEAEQAQAA4AEQAPABcAFgAMABIAEAAQAAkACwATAAYACwAQAAwABAAHAAcA/P8CAAYAAQACAAIA8f/9/wMA+////wUA/P/9/wEAAwD9/wAAAQD///r/AQAAAP3/AQD//wcABAD+/wEAAAABAAEAAgD5/wMAAgD+//r/AAD5//j/BAD7//7///8FAPz//f8BAPz/AgD3/wMAAAD+/wEA/f8AAP3/+v8AAAAA/////wAA/v8AAAAA//8AAAAAAAAAAAcA/v8AAAIA//8BAAgABwAFAAkAAAACAP3//f8HAAAAAgAAAAEAAAAAAAAA//8BAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4/wIAAAD+/wAA9//7//n/+f/4/////f/6////+v/9//3//f/+//7/9//4//r/9v8AAPz//P8EAAEAAgADAP7/BwAQAAwAFgAPABEAFQAVABcAFwAbABEADwAYAAsADgAXABEAEQARABAADgARAA8AFwAWAAwAEgAQABAACQALABMABgALABAADAAEAAcABwD8/wIABgABAAIAAgDx//3/AwD7////BQD8//3/AQADAP3/AAABAP//+v8BAAAA/f8BAP//BwAEAP7/AQAAAAEAAQACAPn/AwACAP7/+v8AAPn/+P8EAPv//v///wUA/P/9/wEA/P8CAPf/AwAAAP7/AQD9/wAA/f/6/wAAAAD/////AAD+/wAAAAD//wAAAAAAAAAABwD+/wAAAgD//wEACAAHAAUACQAAAAIA/f/9/wcAAAACAAAAAQAAAAAAAAD//wEAAAAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAA+f/6//z/+P/5//f/5//h/+X/3//k/9//2//b/+H/4P/d/+L/3P/k/9r/5P/v/+f/6//j/+r/6P/o//P/7v/v/+//8//0//X/7v/3/+//7P/0//b/8P/0//v/+/////3/AQABAPv//v8HAP3//P8BAP7//////wEAAAAAAAAA//8BAP//AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgA/v8AAAMAAAABAAAAAgAAAAAAAAD5/wMAAAD9/wAAAAAAAAYA/////wMAAAAAAPn/BAAIAAIA//8AAP7/AAADAP//AQD///j/AQACAP7/BwAFAAsABQAHAA0AAAAGAP3/BgAEAAYACwAFAAwABwAVABIACgAPABYADQAUACMAGgAXABkAHwAUABgAGgANABEADgAPABYAEwARABMAFAASABQADAAMAA0ACgAKAAoACgAHAAgABwAHAAcABQAGAAUABQAEAAQABAADAAMAAwADAAIA/P/9//7/+v/6//n/6f/i/+b/4P/l/+D/3f/c/+L/4f/e/+P/3f/l/9v/5f/v/+f/7P/j/+v/6f/p//P/7v/w/+//8//0//X/7v/4/+//7P/0//b/8P/0//v/+/////3/AQABAPv//v8HAP3//P8BAP7/AAD//wEAAAAAAAAA//8BAP//AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgA/v8AAAMAAAABAAAAAgAAAAAAAAD5/wMAAAD9/wAAAAAAAAYA/////wMAAAAAAPn/BAAIAAIA//8AAP7/AAADAP//AQD///j/AQACAPAAA",
"answer" : "xxxxxx",
"clear_screen" : false
},

0 Karma

mala_splunk_91
Explorer

Thanks for this link...

But SED Substitution, will not reduce the data size right?

Is that possible to substitute some n character of data with xxx charactes?

Like "iVBORw0KGgoAAAANSUhEUgAAAJYAAAAoCAIAAACTo5SwAAAeIUlEQVR42u2cd1hTV/z/60Ac1Var\r\/\r\noZAEBJ9v" To "xxxx"

Please post me some idea to reduce data size for indexing.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Yes, data masking using SED substitution will reduce the data size. It will allow you to replace 50kb of data with a few characters (like "xxxx").
There are examples in the link provided by kmorris.

---
If this reply helps you, Karma would be appreciated.

MuS
SplunkTrust
SplunkTrust

Can you post your props.conf and transforms.conf please? Because this is the way it is done in Splunk, so there must be something wrong or the configs where not on the parsing instance, see the docs http://docs.splunk.com/Documentation/Splunk/latest/Admin/Configurationparametersandthedatapipeline#H... for more details

cheers, MuS

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Data masking methods using props.conf and transforms.conf seems like the best way to go. What did you try already? Perhaps we can help you get that to work.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...