How to use heavy forwarder to parse data before se...

pcnascimento · ‎12-16-2024

This is my first time using splunk cloud. And I'm trying to perform field extraction directly in the heavy forwarder before indexing the data.

I created REPORT and TRANSFORM in props.conf with transform.conf configured using regex expression tested and functional in splunk Cloud through field extract, but it does not work when trying to use HF

Are there any limitations on data extraction when using heavy forwarder to Splunk Cloud?

pcnascimento · ‎12-16-2024

Sorry for not being so clear, here is a description of what was done:

I want to extract fields in HF before sending to Splunk Cloud.

transforms.conf
[field_extract_username]
SOURCE_KEY = _raw
REGEX = (\susername\s\[(?P<user>.+?)\]\s)
FORMAT = user::$1

props.conf
[keycloak]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
disabled = false
SHOULD_LINEMERGE = true
REPORT-field_extract = field_username
EXTRACT-username = \susername\s\[(.+?)\]\s
EXTRACT-user = (\susername\s\[(?P<user>.+?)\]\s)

EXTRACT-username and EXTRACT-user I created as a test after REPORT-field_extract extracted the user field.

_raw log:

{
"log": "stdout F {\"timestamp\":\"%s\",\"sequence\":%d,\"loggerClassName\":\"org.jboss.logging.Logger\",\"loggerName\":\"br.com.XXXXXX. keycloak.login.pf.clients.CustomerLoginClient\",\"level\":\"INFO\",\"message\":\"CustomerLoginClient.fetchValidateLogin - Processed - username [XX157118577] clientId [https://www.XXXX.com/app] took [104ms]\",\"threadName\":\"executor-thread-3577\",\"threadId\":1XXXXX73,\"mdc\":{\"dt.entity.process_group\":\"PROC ESS_GROUP-DXXA014C1XXXX7EC\",\"dt.host_group.id\":\"prd\",\"dt.entity.host_group\":\"HOST_GROUP-46FAFFBA838D4E81\", \"dt.entity.host\":\"HOST-971DXXXXXXX0F72E\",\"dt.entity.process_group_instance\":\"PROCESS_GROUP_INSTANCE-60C0A631 DB5AB172\"},\"ndc\":\"\",\"hostName\":\"keycloak-XXXXX-X\",\"processName\":\"QuarkusEntryPoint\",\"processId\":1}",
"source": "/var/log/containers/keycloak-XXXXX-0_XXXXXX_keycloak-814935ba7b1d4XXXXXXXXeb8d4dfc51d27283a257c4a96526eb.log",
"host": "[\"keycloak-XXXXX-0\"]",
"type": "-",
"environment": "prod"
}

PickleRick · ‎12-16-2024

REPORT and EXTRACT are search-time settings (they define what's being done when Splunk fetches indexed data from indexers). Therefore configuring them on a HF is pointless. If you really really want to use index-time extraction to create an indexed field (which might not be the best idea), you should use TRANSFORM.

BTW, you shouldn't use SHOULD_LINEMERGE=true (it's for a very very rare border cases and it incures big performance penalty),

And your data looks as if it needed some external preprocessing step to extract the json object from within the log field string.

richgalloway · ‎12-16-2024

Please describe the problem you are having without using the phrase "it does not work" as that tells us nothing about what is wrong.

Heavy forwarders parse data exactly the same way indexers do so any props and transforms you would use on an indexer should work on a HF. If the data passes through more than one HF then only the first one does the parsing. Also, data sent via HEC to the /events endpoint is not parsed at all.

Make sure the props are in the right stanza (the stanza name matches the incoming sourcetype or starts with "source::" and matches the source name or starts with "host::" and matches the sending host's name). Be sure to test regular expressions (I like to use rege101.com, but it's not perfect) before using them.

---
If this reply helps you, Karma would be appreciated.

PickleRick · ‎12-16-2024

I think you did a little shortcut here. Data received on the /event HEC endpoint is normally parsed and processed it's just that line breaking is skipped (because we're explicitly receiving data already split into single events) and by default time parsing is skipped.

Other than that normal index-time operations are performed.

pcnascimento · ‎12-17-2024

@PickleRick , How should I do it?

This way below should work?
[keycloak]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
disabled = false
TRANSFORMS-field_extract = field_username

When you say "external preprocessing" do you mean somewhere else before the Heavy Forwarder?

PickleRick · ‎12-17-2024

Yes. If you had your event sanitized before ingesting (not have a whole json structure inserted as a text member of another json), you could have it parsed as a normal json without manually having to extract each field (and manipulating structured data with regexes is bound to hit some walls sooner or later). Also - I'd advise against doing indexed extractions unless you have a very good use case for them.

pcnascimento · ‎12-17-2024

Thanks for the feedback.

My understanding is that I would gain performance in the future. Am I wrong?

I am currently using field extract in splunk cloud.

richgalloway · ‎12-17-2024

At one time, parsing on an HF actually made the indexers work *harder*, but I'm not sure that's still the case.

HFs should off-load some SVCs from your Splunk Cloud indexers.

HFs will increase the network traffic to Splunk Cloud.

---
If this reply helps you, Karma would be appreciated.

pcnascimento · ‎12-19-2024

Forgive me, but I still have doubts.

Is your recommendation not to use Heavy Forwarder for normalization of data?

richgalloway · ‎12-19-2024

I recommend using a HF only if necessary. In addition to the factors listed previously, HFs add a layer of complexity, are something else to manage, and introduce another point of failure.

I distinct advantage of HFs in a Splunk Cloud environment is better control over how your data is parsed. It's much easier to manage the apps in a HF than it is to do so in Splunk Cloud - even with the Victoria experience.

Of course, you should have at least 2 HFs for redundancy.

---
If this reply helps you, Karma would be appreciated.

How to use heavy forwarder to parse data before send to splunk cloud

administration

configuration

troubleshooting

using Splunk Cloud

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How to use heavy forwarder to parse data before send to splunk cloud

administration

configuration

troubleshooting

using Splunk Cloud

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits