Data filtering location

splunklearner · ‎11-16-2024

Hello, let me explain my architecture.

Multi site cluster (3 site cluster)...

2 indexers, 1 SH, 2 syslog servers (UF installed)... In each site

1 Dep server, 1 Deployer overall, 2 cluster managers (1 stand by)...

As of now, network logs are configured to our syslog server and UF forward the data to indexers.

We will configure logs with the help of FQDN.

For example we have X application which may or may not contain FQDN. If it contains FQDN, it will go to that app index or else it will go to different index. (Wrote these props and transforms in cluster manager).

In deployment server inputs.conf we just given log path along with different index (which specified in transforms of Cluster manager). So all the logs will flow to cluster manager and then we wrote props and transforms to filter the data.

Is there any other way to write these configurations other than this?

Giving props and transforms of cluster manager:

cat props.conf

[f5_waf]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25
TIME_FORMAT = %b %d %H:%M:%S
SEDCMD-newline_remove = s/\\r\\n/\n/g
LINE_BREAKER = ([\r\n]+)[A-Z][a-z]{2}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s
SHOULD_LINEMERGE = False
TRUNCATE = 10000

# Leaving PUNCT enabled can impact indexing performance. Customers can
# comment this line if they need to use PUNCT (e.g. security use cases)
ANNOTATE_PUNCT = false

TRANSFORMS-0_fix_hostname = syslog-host
TRANSFORMS-1_extract_fqdn = f5_waf-extract_fqdn
TRANSFORMS-2_fix_index = f5_waf-route_to_index

cat transforms.conf

# FIELD EXTRACTION USING A REGEX
[f5_waf-extract_fqdn]
SOURCE_KEY = _raw
REGEX = Host:\s(.+)\n
FORMAT = fqdn::$1
WRITE_META = true

# Routes the data to a different index-- This must be listed in a TRANSFORMS-<name> entry.

[f5_waf-route_to_index]
INGEST_EVAL = indexname=json_extract(lookup("fqdn_indexname_mapping.csv", json_object("fqdn", fqdn), json_array("indexname")), "indexname"), index=if(isnotnull(indexname), indexname, index), fqdn:=null(), indexname:=null()

cat fqdn_indexname_mapping.csv

fqdn	indexname
selenium.systems.us.fed	xxx_app_selenium1
v-testlab-service1.systems.us.fed	xxx_app_testlab_service1

Gone through documents but just asking for any better alternatives??

PickleRick · ‎11-19-2024

Typically most of the fields Splunk uses are so-called "search-time fields" - Splunk parses them out during the search. Here you're extracting the fqdn as an indexed fields which means you're parsing it out during indexing and writing an additional field into the metadata files along the data itself. This has its cons (like immutability - once you've extracted it, you cannot fix it in case something went wrong or additional space usage) and usually does not have many pros. The most obvious advantage of having an indexed field is speed if you're using summaries on this field - then you can use the tstats command and it's lightning fast compared to normal event search and summarizing with the normal stats command. But other than that there are few cases when indexed fields are called for. But that's really an advanced topic.

4. Yes, if you plan to restrict access to data, multiple indexes is indeed the way to go.

5. When you're using INDEX_EVAL to set a value with a normal = assignment, if that field already has a value, a new value is _added_ to that field creating a multivalued field. If you use := assignment, the old value - if present - is overwritten. I'm not 100% sure how would Splunk treat a multivalued index field (well, index is not technically a field stored in metadata along the event itself, it's just where the event is written). So just to be on the safe side I'd use :=

6. You're doing a lot of json parsing and rendering and doing lookups on csv (which will be done linearly) so that might have a noticeable performance impact on your Splunk instance if you have a lot of data. You simply might be able to both write it in an easier maintainable way and have it perform better if you implemented this logic one step earlier - in your syslog daemon.

splunklearner · ‎11-20-2024

@PickleRick appreciated your detailed response.

6 point -- where I can implement in syslog server? in syslog can I write props and transforms? In syslog server we will be installing UF to forward the data to our Splunk.

Can you please specify the location and process?

PickleRick · ‎11-20-2024

You wrote "2 syslog servers (UF installed)". I thought you meant - as is often done - that you have two servers which have some form of an external syslog daemon writing to local files and UF which picks up data from those files. Those syslog servers are completely external to Splunk.

PickleRick · ‎11-18-2024

Hold up there.

You're mixing different things.

1. Deployment server is a component used to distribute apps to forwarders, sometimes standalone indexers or standalone search heads. It is _not_ used for managing clustered indexers!

2. You don't send data to the CM! CM manages configuration and state of the indexers but isn't involved in indexing and/or processing the incoming data

3. I have no idea why you're extracting the fqdn as indexed field. (true, if you're often doing tstats over it, it can make sense but you also probably normalize your data to CIM so you can do tstats over the dataset).

4. Are you sure you need so many indexes (just asking - maybe you indeed do; but people tend to be "trigger-happy" with creating too many indexes).

5. I think you should overwrite the index field with := rather than simply assign a new value with =

6. You know it will be slow, right? Why not do it one step earlier - on your syslog daemon?

splunklearner · ‎11-18-2024

Can you please be more descriptive on 3,4,5,6 points. I am very new to Splunk admin and still learning things. Thanks.

splunklearner · ‎11-18-2024

For point 4...

We will create seperate AD groups to different application teams and then we assign them and index and then we will restrict them the access to their index only. This is the idea.

That is the reason, we create indexes based on the applications? Is it a good approach or any other is there to restrict them other than Index? Like 10 application data in one index and one cannot see other not possible?? Possible? Please tell me.

gcusello · ‎11-17-2024

Hi @splunklearner ,

in general, you have to locate your props.conf and transforms.conf files on

your Search Heads for the Search Time transformations,
on the first full Splunk instance (indexers or Heavy Forwarders not Universal Forwarders) where data pass through.

In your case on SHs and on IDXs because you haven't HFs.

Then you could also put them in UFs, but it isn't mandatory

Ciao.

Giuseppe

Data filtering location

field extraction

heavy forwarder

index

indexer

JSON

props.conf

source

sourcetype

syslog

transforms.conf

universal forwarder

Preparing your Splunk Environment for OpenSSL3

Unleash Unified Security and Observability with Splunk Cloud Platform

Splunk AppDynamics with Cisco Secure Application