Splunk Search
Highlighted

Data anonymizing and index routing

Communicator

Hi!

I am considering to implement two separate indexes containing
non-anonymized data and anonyimized on the other.

input data looks like following,

a,b,c
1,2,3
4,5,6
7,8,9

I have configured props.conf and transforms.conf as,

props.conf

[hoge]
SHOULD_LINEMERGE =False
REPORT-1 = searchext
TRANSFORMS-1 = setnull
TRANSFORMS-2 = indexrouting1
TRANSFORMS-3 = anonymize
TRANSFORMS-4 = indexrouting2

Transforms.conf

[searchext]
DELIMS = ","
FIELDS = "a","b","c"

[setnull]
REGEX = a
DEST_KEY = queue
FORMAT = nullQueue

[anonymize]
REGEX = (\d),(\d),(\d)
FORMAT = $1,###,$2
DEST_KEY = _raw

[indexrouting2]
REGEX = ###
DEST_KEY = _MetaData:Index
FORMAT = indexB

It seems that data is only going into index B.
However , I want them to go indexA and indexB

I appreciate if someone can verify this.

Thanks,
Yu

Tags (3)
0 Karma
Highlighted

Re: Data anonymizing and index routing

Ultra Champion

Normally, you'd write your transforms like;

TRANSFORMS-blah = transform1, transform2, transform3

This means that each event to be transformed will go through all three transforms before returning to the pipeline for further processing (i.e. indexing). However, that will not let you create multiple copies of the events into different indexes.

One way of doing it is to index the events normally into indexA, and then have a scheduled search that changes the events and populates another index (indexB) ;

index=indexA earliest=-1h@h latest=@h | replace "SecretStuff" with #### in _raw | collect index=indexB

or

index=indexA earliest=-1h@h latest=@h | rex field=_raw mode=sed "s/<some_regex>/###/" | collect index=indexB

Setting this to run 5 minutes past every hour will ensure that all events are collected only once into indexB. Restrict the access to indexA. Allow general access to indexB. Adjust timeranges and scheduling to your needs.

Hope this helps,

k

View solution in original post

Highlighted

Re: Data anonymizing and index routing

Communicator

Hello Kristian.

Thank you for the reply.

This sounds good!

I will give it a try.

Thanks,
Yu

0 Karma
Highlighted

Re: Data anonymizing and index routing

Communicator

Hello Kristian.

I got it working but it seems that field extraction of stash sourcetype extracts the raw data itself to field.

I would like to disable the stash sourcetype extraction but do you know any way to do this?

Thanks,
YU

0 Karma
Highlighted

Re: Data anonymizing and index routing

Ultra Champion

The collect command allows you to specify an alternate location of where to write the file. Thus you should be able to set up a [monitor] stanza in inputs.conf with the correct source/sourcetype (so that the field extractions are automatically applied). I'm sorry, but I haven't played with this feature a lot. You'll need to do a bit of your own testing.

0 Karma
Highlighted

Re: Data anonymizing and index routing

Ultra Champion

...| collect index=dummy file=apa spool=f

Then set up the [monitor] to watch this file in SPLUNK_HOME/var/run/splunk

Set the source and sourcetype like the original data in inputs.conf

At the head of the file that is created there will be a line *** SPLUNK *** and some info regarding the index you set in the the collect This line will indexed as a separate event, but you can probably remove it through a nullQueue transform.

0 Karma