Getting Data In

Which properties are available for a Universal Forwarder in Props/Transforms ?

ctaf
Contributor

Hi,

I can't find any reference in the docs (i.e. : http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/Propsconf) of Props or Transforms about which attributes are available/working on an Universal Forwarder.

Is there any exhaustive documentation about it?

Thank you

1 Solution

martin_mueller
SplunkTrust
SplunkTrust

CHARSET applies at input, see http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/propsconf (search for "input time" for a fairly exhaustive list)

Linebreaking happens at parsing ( LINE_BREAKER, TRUNCATE).
Line Merging happens at merging ( BREAK_ONLY_BEFORE, MUST_BREAK_AFTER, SHOULD_LINEMERGE).
Timestamping happens at typing ( DATETIME_CONFIG, TIME_FORMAT, TIME_PREFIX, MAX_TIMESTAMP_LOOKAHEAD)
See http://wiki.splunk.com/Community:HowIndexingWorks

Where those four happen depends on what path the data takes through Splunk. Input usually happens on a UF, while the other three happen where the data is cooked. By default that's the indexer, if you have a heavy forwarder along the route it's usually the first heavy forwarder, and for indexed extractions it's usually the inputting forwarder - even a UF.

View solution in original post

woodcock
Esteemed Legend

Given your clarification of why you are asking this question, that what you really need to do is split some configurations from a Heavy Forwarder between a new Universal Forwarder and the Heavy Forwarder, the safest thing to do is to copy all the existing settings to both places. As @martin_mueller said, you can safely have "too many" settings on the UF, it knows what to use and to ignore. Having "extra" settings in both places will not cause any problems (not even "setting ignored" logs).

0 Karma

joesrepsolc
Communicator

Interesting suggestion to copy to both UF and indexer layer. I had not thought of that. While that all but guarantees the success of the props, I'm still looking into the link provided and trying to better understand WHERE i need to make some props.conf adjustments. Generally speaking I keep things in one place (on the indexer tier (slave-apps). Gives me one place to look and manage the settings for the incoming data.

I would like to understand more about when i should push to the UF instead, potentially saving those precious indexer tier resources. Seeking to understand more...

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

CHARSET applies at input, see http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/propsconf (search for "input time" for a fairly exhaustive list)

Linebreaking happens at parsing ( LINE_BREAKER, TRUNCATE).
Line Merging happens at merging ( BREAK_ONLY_BEFORE, MUST_BREAK_AFTER, SHOULD_LINEMERGE).
Timestamping happens at typing ( DATETIME_CONFIG, TIME_FORMAT, TIME_PREFIX, MAX_TIMESTAMP_LOOKAHEAD)
See http://wiki.splunk.com/Community:HowIndexingWorks

Where those four happen depends on what path the data takes through Splunk. Input usually happens on a UF, while the other three happen where the data is cooked. By default that's the indexer, if you have a heavy forwarder along the route it's usually the first heavy forwarder, and for indexed extractions it's usually the inputting forwarder - even a UF.

youngsuh
Contributor

Here are the latest props.conf setting at 9.2.1 on the universal forwarder:  (json file parsing works g8 with this option)

EVENT_BREAKER_ENABLE = <boolean>
EVENT_BREAKER = <regular expression>
LB_CHUNK_BREAKER = <regular expression>
force_local_processing = <boolean>  * new

* Forces a universal forwarder to process all data tagged with this sourcetype
locally before forwarding it to the indexers.
* Data with this sourcetype is processed by the linebreaker,
aggerator, and the regexreplacement processors in addition to the existing
utf8 processor.
* Note that switching this property potentially increases the cpu
and memory consumption of the forwarder.
* Applicable only on a universal forwarder.
* Default: false

0 Karma

koshyk
Super Champion

great explanation. better than splunk docs 🙂

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You can safely have "too many" settings on the UF, it knows what to use and to ignore.

0 Karma

ctaf
Contributor

Of course, but I'd rather know what I am doing

0 Karma

ctaf
Contributor

Thanks for the tip about searching "input time", it is really helping!

So If I need to apply all the settings you're mentionning, except "CHARSET", I need to do it on the Indexer side?

On the other hand, according to woodcock's answer, TZ attribute is working on the UF, but in the docs (http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/propsconf) there is no "input time" mention.

0 Karma

woodcock
Esteemed Legend

As far as "applying all the settings", that is generally a poor approach. I would try using the settings you are pretty sure that you need to change from the default and trust the defaults for the rest, until testing proves there is a problem. The VAST majority of the time, very few settings (changes) are used, even in a complex situation.

0 Karma

ctaf
Contributor

The thing is, I know I need to change them because it is settings set on an Heavy Forwarder. But one project we have is to convert the Heavy Forwarder into a Universal Forwarder. I am trying to find which attributes should be copied into the UF and which attributes should be copied into the Indexers directly.

0 Karma

woodcock
Esteemed Legend

Your title says "for" a UF and your question says "on" a UF. I am going to assume that you literally mean "on a UF". Because the UF does not index the data (with the exception of INDEXED EXTRACTIONS), very little in those files makes any sense to deploy "to" and use "on" the UF. Some that DO include:

props.conf:
TZ, sourcetype, NO_BINARY_CHECK, CHECK_METHOD, priority, and of course INDEXED_EXTRACTIONS (and its associates).

I cannot think of anything in transforms.conf that takes effect on the UF.

0 Karma

ctaf
Contributor

Thanks, I was hoping for an exhaustive list, it is strange that Splunk does not provide one.

What about the following :

CHARSET

DATETIME_CONFIG
TIME_FORMAT
TIME_PREFIX
MAX_TIMESTAMP_LOOKAHEAD

LINE_BREAKER
BREAK_ONLY_BEFORE
MUST_BREAK_AFTER

TRUNCATE
SHOULD_LINEMERGE

?

0 Karma

woodcock
Esteemed Legend

The problem is that some of this is "it depends", especially when you use INDEXED_EXTRACTIONS.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The whole props/transforms shebang does take effect for data that is cooked on the UF through INDEXED_EXTRACTIONS.

0 Karma

woodcock
Esteemed Legend

Yes, that is why I called it out as an exception.

0 Karma

somesoni2
Revered Legend

This should give you information on how data moves from source to Splunk and what all activities are performed and by which node (forwarder/heavy forwarder/indexer)
http://docs.splunk.com/Documentation/Splunk/6.5.2/Deploy/Componentsofadistributedenvironment
http://docs.splunk.com/Documentation/Splunk/6.5.2/Deploy/Datapipeline

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

lim2
Communicator

Hi @martin_mueller, All, Great discussion. Trying to find props stanza rules for UF to enforce line breaking for 100% of records in json files. I need inputs to resolve combined records from json files for S3 usage metrics (228 out of ~550K json files) during ingestion from 1 universal forwarder (UF). I'm hearing that the business users need the records in the json files to be properly ingested as 1 record for each line of json file for report correctness. Currently there are ~22k out of 5.5mil records which are ingested combined in those 228 json files.

One record in the json file is

{"ReportDate":"10-31-2021","Bucket":"0123-5678-9999","Prefix":"processingDate%3D2021-10-28\/errors\/source\/error_type%3Dinter\/venuecd\/","StorageClass":"STANDARD","IsLatest":true,"IsDeleteMarker":false,"SizeGB":0.4184613759,"Count":6744}

Used the following props stanza on the UF using the information in https://docs.splunk.com/Documentation/Splunk/8.2.2/Admin/Propsconf (I also think that Props which could be applied on the UF side should be more clearly displayed):

[source::/path/s3data/*s3usageinfo.json]
EVENT_BREAKER_ENABLE = true
force_local_processing = true
INDEXED_EXTRACTIONS = json
disabled = false
TIMESTAMP_FIELDS = ReportDate
TIME_FORMAT = %m-%d-%Y

 

 Planning to add "timestamp":"1636481800" in the json record to resolve line breaking. Any other suggestions? Thanks.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...