Splunk Cloud Platform

How to use heavy forwarder to parse data before send to splunk cloud

pcnascimento
Loves-to-Learn Lots

This is my first time using splunk cloud. And I'm trying to perform field extraction directly in the heavy forwarder before indexing the data.

I created REPORT and TRANSFORM in props.conf with transform.conf configured using regex expression tested and functional in splunk Cloud through field extract, but it does not work when trying to use HF

Are there any limitations on data extraction when using heavy forwarder to Splunk Cloud?

0 Karma

pcnascimento
Loves-to-Learn Lots

Sorry for not being so clear, here is a description of what was done:

I want to extract fields in HF before sending to Splunk Cloud.

transforms.conf
[field_extract_username]
SOURCE_KEY = _raw
REGEX = (\susername\s\[(?P<user>.+?)\]\s)
FORMAT = user::$1

props.conf
[keycloak]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
disabled = false
SHOULD_LINEMERGE = true
REPORT-field_extract = field_username
EXTRACT-username = \susername\s\[(.+?)\]\s
EXTRACT-user = (\susername\s\[(?P<user>.+?)\]\s)

EXTRACT-username and EXTRACT-user I created as a test after REPORT-field_extract extracted the user field.

_raw log:

{
"log": "stdout F {\"timestamp\":\"%s\",\"sequence\":%d,\"loggerClassName\":\"org.jboss.logging.Logger\",\"loggerName\":\"br.com.XXXXXX. keycloak.login.pf.clients.CustomerLoginClient\",\"level\":\"INFO\",\"message\":\"CustomerLoginClient.fetchValidateLogin - Processed - username [XX157118577] clientId [https://www.XXXX.com/app] took [104ms]\",\"threadName\":\"executor-thread-3577\",\"threadId\":1XXXXX73,\"mdc\":{\"dt.entity.process_group\":\"PROC ESS_GROUP-DXXA014C1XXXX7EC\",\"dt.host_group.id\":\"prd\",\"dt.entity.host_group\":\"HOST_GROUP-46FAFFBA838D4E81\", \"dt.entity.host\":\"HOST-971DXXXXXXX0F72E\",\"dt.entity.process_group_instance\":\"PROCESS_GROUP_INSTANCE-60C0A631 DB5AB172\"},\"ndc\":\"\",\"hostName\":\"keycloak-XXXXX-X\",\"processName\":\"QuarkusEntryPoint\",\"processId\":1}",
"source": "/var/log/containers/keycloak-XXXXX-0_XXXXXX_keycloak-814935ba7b1d4XXXXXXXXeb8d4dfc51d27283a257c4a96526eb.log",
"host": "[\"keycloak-XXXXX-0\"]",
"type": "-",
"environment": "prod"
}

0 Karma

PickleRick
SplunkTrust
SplunkTrust

REPORT and EXTRACT are search-time settings (they define what's being done when Splunk fetches indexed data from indexers). Therefore configuring them on a HF is pointless. If you really really want to use index-time extraction to create an indexed field (which might not be the best idea), you should use TRANSFORM.

BTW, you shouldn't use SHOULD_LINEMERGE=true (it's for a very very rare border cases and it incures big performance penalty),

And your data looks as if it needed some external preprocessing step to extract the json object from within the log field string.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please describe the problem you are having without using the phrase "it does not work" as that tells us nothing about what is wrong.

Heavy forwarders parse data exactly the same way indexers do so any props and transforms you would use on an indexer should work on a HF.  If the data passes through more than one HF then only the first one does the parsing.  Also, data sent via HEC to the /events endpoint is not parsed at all.

Make sure the props are in the right stanza (the stanza name matches the incoming sourcetype or starts with "source::" and matches the source name or starts with "host::" and matches the sending host's name).  Be sure to test regular expressions (I like to use rege101.com, but it's not perfect) before using them.

---
If this reply helps you, Karma would be appreciated.
0 Karma

PickleRick
SplunkTrust
SplunkTrust

I think you did a little shortcut here. Data received on the /event HEC endpoint is normally parsed and processed it's just that line breaking is skipped (because we're explicitly receiving data already split into single events) and by default time parsing is skipped.

Other than that normal index-time operations are performed.

0 Karma

pcnascimento
Loves-to-Learn Lots

@PickleRick , How should I do it?

This way below should work?
[keycloak]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
disabled = false
TRANSFORMS-field_extract = field_username

When you say "external preprocessing" do you mean somewhere else before the Heavy Forwarder?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yes. If you had your event sanitized before ingesting (not have a whole json structure inserted as a text member of another json), you could have it parsed as a normal json without manually having to extract each field (and manipulating structured data with regexes is bound to hit some walls sooner or later). Also - I'd advise against doing indexed extractions unless you have a very good use case for them.

0 Karma

pcnascimento
Loves-to-Learn Lots

Thanks for the feedback.

My understanding is that I would gain performance in the future. Am I wrong?

I am currently using field extract in splunk cloud.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

At one time, parsing on an HF actually made the indexers work *harder*, but I'm not sure that's still the case.

HFs should off-load some SVCs from your Splunk Cloud indexers.

HFs will increase the network traffic to Splunk Cloud.

---
If this reply helps you, Karma would be appreciated.
0 Karma

pcnascimento
Loves-to-Learn Lots

Forgive me, but I still have doubts.

Is your recommendation not to use Heavy Forwarder for normalization of data?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I recommend using a HF only if necessary.  In addition to the factors listed previously, HFs add a layer of complexity, are something else to manage, and introduce another point of failure.

I distinct advantage of HFs in a Splunk Cloud environment is better control over how your data is parsed.  It's much easier to manage the apps in a HF than it is to do so in Splunk Cloud - even with the Victoria experience.

Of course, you should have at least 2 HFs for redundancy.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Technical Workshop Series: Splunk Data Management and SPL2 | Register here!

Hey, Splunk Community! Ready to take your data management skills to the next level? Join us for a 3-part ...

Spotting Financial Fraud in the Haystack: A Guide to Behavioral Analytics with Splunk

In today's digital financial ecosystem, security teams face an unprecedented challenge. The sheer volume of ...

Solve Problems Faster with New, Smarter AI and Integrations in Splunk Observability

Solve Problems Faster with New, Smarter AI and Integrations in Splunk Observability As businesses scale ...