Getting Data In

How to index the same set of logs and route them to 2 different indexes, but process transforms for one index to filter out sensitive data?

Explorer

Hello fellow-splunkers!

Problem Statement
- My logs have INFO, WARNING and DEBUG log entries. The DEBUG log entries have customer-specific information which I wouldn't want to expose to a wider audience.
- I want some specific users in the team to have access to the logs with these DEBUG log entries. Others shouldn't be able to access it.

My Solution
- Create 2 indexes. 'index-normal' and 'index-debug'.
- Have roles and users created so that the access to these indexers is provided accordingly. Easy. Can be managed!
- At the forwarder, I have 2 segments - each corresponding to indexing the same log to a different index. Note that I am attempting to bypass the props.conf and transforms.conf at the indexer by using queue = indexQueue in one of the sections.

[monitor:///mypath/abc.log]
disabled = false
index = index-normal
sourcetype = mysourcetype

[monitor:///mypath/abc.log]
disabled = false
index = index-debug
sourcetype = mysourcetype
queue = indexQueue
  • With the above configuration, I am attempting to index the same file twice and sending them to 2 separate indexes. One (index) going through the props.conf and transforms.conf configs at the indexer and the other (index-debug) bypassing it.
  • At the indexer, I am stripping off the logs of log-entries which has the DEBUG string in it.

props.conf

[mysourcetype]
TRANSFORMS-null= setnull
NO_BINARY_CHECK = 1
pulldown_type = 1

transforms.conf:

[setnull]
REGEX = DEBUG
DEST_KEY = queue
FORMAT = nullQueue

Needless to say, this isn't working.

Questions
- Is this the best way to handle this situation? I am trying to index the same log twice (and maybe thats not happening). Is there a better approach by using some logic at the indexer end?
- If this is the approach which is to be used, where am I going wrong?

Thanks!

Communicator

Just found CLONE_SOURCETYPE today in transforms.conf.spec:
http://docs.splunk.com/Documentation/Splunk/latest/admin/Transformsconf

Sounds like it might be what you need (see excerpts below):

CLONE_SOURCETYPE = <string>

* If CLONE_SOURCETYPE is used as part of a transform, the transform will
  create a modified duplicate event, for all events that the transform is
  applied to via normal props.conf rules.
* Use this feature if you need to store both the original and a modified
  form of the data in your system, or if you want to send the original and a
  modified form to different outbound systems.
  * A typical example would be to retain sensitive information according to
    one policy and a version with the sensitive information removed
    according to another policy.  For example, some events may have data
    that you must retain for 30 days (such as personally identifying
    information) and only 30 days with restricted access, but you need that
    event retained without the sensitive data for a longer time with wider
    access.

Then in the examples:

[hide-ip-address]
# Make a clone of an event with the sourcetype masked_ip_address.  The clone
# will be modified; its text changed to mask the ip address.
# The cloned event will be further processed by index-time transforms and
# SEDCMD expressions according to its new sourcetype.
# In most scenarios an additional transform would be used to direct the
# masked_ip_address event to a different index than the original data.
REGEX = ^(.*?)src=\d+\.\d+\.\d+\.\d+(.*)$
FORMAT = $1src=XXXXX$2
DEST_KEY = _raw
CLONE_SOURCETYPE = masked_ip_addresses

Builder
  1. You can't index the same file twice. Splunk keeps track of what it's already indexed in something known as the fishbucket, so it'll know it's already been there. The inputs are also sort of collapsed if you reference them more than once so it'll really only look at one input statement
  2. You should be able to do an index time transform of the index for specific data. See the following post where you set the _Metadata:Index element + format to new index https://answers.splunk.com/answers/100609/redirection-to-different-index-using-transforms-conf.html

#2 will work, however how are you receiving this data? Is it coming in via syslog-ng written to a file, or what? Is there any way to break the data into separate files? That way you could just have 2 input statements watching each separate file. If not, then you might have to do a transform on a subset of the data.

0 Karma

Explorer

Thanks for your help, but unfortunately it doesn't quite help my situation.

The solution you outlined in #2 would basically redirect the log entries identified by a REGEX to a different index. However, in my case, I need the index (in this case, the index_debug) to be populated with not only the DEBUG log entries, but also the INFO and WARNING log entries, basically the unfiltered log.

I would also need the filtered log (without DEBUG) entries to a different index (index_normal)

To your other point, again its a good suggestion. However, I wouldn't think we would be able to get the DEBUG log enetries to a separate file. Technically, we could, but I don't think the team would be receptive to this approach.

0 Karma

Builder

I'm not quite understanding why you need the same data in multiple indexes. Why not just control the permissions in such a way that everyone has access to the general info, and then grant a few access to the debug logs?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!