Hi!
I am considering to implement two separate indexes containing
non-anonymized data and anonyimized on the other.
input data looks like following,
a,b,c
1,2,3
4,5,6
7,8,9
I have configured props.conf and transforms.conf as,
[hoge]
SHOULD_LINEMERGE =False
REPORT-1 = searchext
TRANSFORMS-1 = setnull
TRANSFORMS-2 = indexrouting1
TRANSFORMS-3 = anonymize
TRANSFORMS-4 = indexrouting2
[searchext]
DELIMS = ","
FIELDS = "a","b","c"
[setnull]
REGEX = a
DEST_KEY = queue
FORMAT = nullQueue
[anonymize]
REGEX = (\d),(\d),(\d)
FORMAT = $1,###,$2
DEST_KEY = _raw
[indexrouting2]
REGEX = ###
DEST_KEY = _MetaData:Index
FORMAT = indexB
It seems that data is only going into index B.
However , I want them to go indexA and indexB
I appreciate if someone can verify this.
Thanks,
Yu
Normally, you'd write your transforms like;
TRANSFORMS-blah = transform1, transform2, transform3
This means that each event to be transformed will go through all three transforms before returning to the pipeline for further processing (i.e. indexing). However, that will not let you create multiple copies of the events into different indexes.
One way of doing it is to index the events normally into indexA, and then have a scheduled search that changes the events and populates another index (indexB) ;
index=indexA earliest=-1h@h latest=@h | replace "SecretStuff" with #### in _raw | collect index=indexB
or
index=indexA earliest=-1h@h latest=@h | rex field=_raw mode=sed "s/<some_regex>/###/" | collect index=indexB
Setting this to run 5 minutes past every hour will ensure that all events are collected only once into indexB. Restrict the access to indexA. Allow general access to indexB. Adjust timeranges and scheduling to your needs.
Hope this helps,
k
Normally, you'd write your transforms like;
TRANSFORMS-blah = transform1, transform2, transform3
This means that each event to be transformed will go through all three transforms before returning to the pipeline for further processing (i.e. indexing). However, that will not let you create multiple copies of the events into different indexes.
One way of doing it is to index the events normally into indexA, and then have a scheduled search that changes the events and populates another index (indexB) ;
index=indexA earliest=-1h@h latest=@h | replace "SecretStuff" with #### in _raw | collect index=indexB
or
index=indexA earliest=-1h@h latest=@h | rex field=_raw mode=sed "s/<some_regex>/###/" | collect index=indexB
Setting this to run 5 minutes past every hour will ensure that all events are collected only once into indexB. Restrict the access to indexA. Allow general access to indexB. Adjust timeranges and scheduling to your needs.
Hope this helps,
k
...| collect index=dummy file=apa spool=f
Then set up the [monitor] to watch this file in SPLUNK_HOME/var/run/splunk
Set the source and sourcetype like the original data in inputs.conf
At the head of the file that is created there will be a line *** SPLUNK ***
and some info regarding the index you set in the the collect
This line will indexed as a separate event, but you can probably remove it through a nullQueue transform.
The collect
command allows you to specify an alternate location of where to write the file. Thus you should be able to set up a [monitor]
stanza in inputs.conf with the correct source/sourcetype (so that the field extractions are automatically applied). I'm sorry, but I haven't played with this feature a lot. You'll need to do a bit of your own testing.
Hello Kristian.
I got it working but it seems that field extraction of stash sourcetype extracts the raw data itself to field.
I would like to disable the stash sourcetype extraction but do you know any way to do this?
Thanks,
YU
Hello Kristian.
Thank you for the reply.
This sounds good!
I will give it a try.
Thanks,
Yu