This is my first time using splunk cloud. And I'm trying to perform field extraction directly in the heavy forwarder before indexing the data.
I created REPORT and TRANSFORM in props.conf with transform.conf configured using regex expression tested and functional in splunk Cloud through field extract, but it does not work when trying to use HF
Are there any limitations on data extraction when using heavy forwarder to Splunk Cloud?
Sorry for not being so clear, here is a description of what was done:
I want to extract fields in HF before sending to Splunk Cloud.
transforms.conf
[field_extract_username]
SOURCE_KEY = _raw
REGEX = (\susername\s\[(?P<user>.+?)\]\s)
FORMAT = user::$1
props.conf
[keycloak]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
disabled = false
SHOULD_LINEMERGE = true
REPORT-field_extract = field_username
EXTRACT-username = \susername\s\[(.+?)\]\s
EXTRACT-user = (\susername\s\[(?P<user>.+?)\]\s)
EXTRACT-username and EXTRACT-user I created as a test after REPORT-field_extract extracted the user field.
_raw log:
{
"log": "stdout F {\"timestamp\":\"%s\",\"sequence\":%d,\"loggerClassName\":\"org.jboss.logging.Logger\",\"loggerName\":\"br.com.XXXXXX. keycloak.login.pf.clients.CustomerLoginClient\",\"level\":\"INFO\",\"message\":\"CustomerLoginClient.fetchValidateLogin - Processed - username [XX157118577] clientId [https://www.XXXX.com/app] took [104ms]\",\"threadName\":\"executor-thread-3577\",\"threadId\":1XXXXX73,\"mdc\":{\"dt.entity.process_group\":\"PROC ESS_GROUP-DXXA014C1XXXX7EC\",\"dt.host_group.id\":\"prd\",\"dt.entity.host_group\":\"HOST_GROUP-46FAFFBA838D4E81\", \"dt.entity.host\":\"HOST-971DXXXXXXX0F72E\",\"dt.entity.process_group_instance\":\"PROCESS_GROUP_INSTANCE-60C0A631 DB5AB172\"},\"ndc\":\"\",\"hostName\":\"keycloak-XXXXX-X\",\"processName\":\"QuarkusEntryPoint\",\"processId\":1}",
"source": "/var/log/containers/keycloak-XXXXX-0_XXXXXX_keycloak-814935ba7b1d4XXXXXXXXeb8d4dfc51d27283a257c4a96526eb.log",
"host": "[\"keycloak-XXXXX-0\"]",
"type": "-",
"environment": "prod"
}
REPORT and EXTRACT are search-time settings (they define what's being done when Splunk fetches indexed data from indexers). Therefore configuring them on a HF is pointless. If you really really want to use index-time extraction to create an indexed field (which might not be the best idea), you should use TRANSFORM.
BTW, you shouldn't use SHOULD_LINEMERGE=true (it's for a very very rare border cases and it incures big performance penalty),
And your data looks as if it needed some external preprocessing step to extract the json object from within the log field string.
Please describe the problem you are having without using the phrase "it does not work" as that tells us nothing about what is wrong.
Heavy forwarders parse data exactly the same way indexers do so any props and transforms you would use on an indexer should work on a HF. If the data passes through more than one HF then only the first one does the parsing. Also, data sent via HEC to the /events endpoint is not parsed at all.
Make sure the props are in the right stanza (the stanza name matches the incoming sourcetype or starts with "source::" and matches the source name or starts with "host::" and matches the sending host's name). Be sure to test regular expressions (I like to use rege101.com, but it's not perfect) before using them.
I think you did a little shortcut here. Data received on the /event HEC endpoint is normally parsed and processed it's just that line breaking is skipped (because we're explicitly receiving data already split into single events) and by default time parsing is skipped.
Other than that normal index-time operations are performed.
@PickleRick , How should I do it?
This way below should work?
[keycloak]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
disabled = false
TRANSFORMS-field_extract = field_username
When you say "external preprocessing" do you mean somewhere else before the Heavy Forwarder?
Yes. If you had your event sanitized before ingesting (not have a whole json structure inserted as a text member of another json), you could have it parsed as a normal json without manually having to extract each field (and manipulating structured data with regexes is bound to hit some walls sooner or later). Also - I'd advise against doing indexed extractions unless you have a very good use case for them.
Thanks for the feedback.
My understanding is that I would gain performance in the future. Am I wrong?
I am currently using field extract in splunk cloud.
At one time, parsing on an HF actually made the indexers work *harder*, but I'm not sure that's still the case.
HFs should off-load some SVCs from your Splunk Cloud indexers.
HFs will increase the network traffic to Splunk Cloud.
Forgive me, but I still have doubts.
Is your recommendation not to use Heavy Forwarder for normalization of data?
I recommend using a HF only if necessary. In addition to the factors listed previously, HFs add a layer of complexity, are something else to manage, and introduce another point of failure.
I distinct advantage of HFs in a Splunk Cloud environment is better control over how your data is parsed. It's much easier to manage the apps in a HF than it is to do so in Splunk Cloud - even with the Victoria experience.
Of course, you should have at least 2 HFs for redundancy.