Hi,
I can't find any reference in the docs (i.e. : http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/Propsconf) of Props or Transforms about which attributes are available/working on an Universal Forwarder.
Is there any exhaustive documentation about it?
Thank you
CHARSET
applies at input, see http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/propsconf (search for "input time" for a fairly exhaustive list)
Linebreaking happens at parsing ( LINE_BREAKER
, TRUNCATE
).
Line Merging happens at merging ( BREAK_ONLY_BEFORE
, MUST_BREAK_AFTER
, SHOULD_LINEMERGE
).
Timestamping happens at typing ( DATETIME_CONFIG
, TIME_FORMAT
, TIME_PREFIX
, MAX_TIMESTAMP_LOOKAHEAD
)
See http://wiki.splunk.com/Community:HowIndexingWorks
Where those four happen depends on what path the data takes through Splunk. Input usually happens on a UF, while the other three happen where the data is cooked. By default that's the indexer, if you have a heavy forwarder along the route it's usually the first heavy forwarder, and for indexed extractions it's usually the inputting forwarder - even a UF.
Given your clarification of why you are asking this question, that what you really need to do is split some configurations from a Heavy Forwarder between a new Universal Forwarder and the Heavy Forwarder, the safest thing to do is to copy all the existing settings to both places. As @martin_mueller said, you can safely have "too many" settings on the UF, it knows what to use and to ignore. Having "extra" settings in both places will not cause any problems (not even "setting ignored" logs).
Interesting suggestion to copy to both UF and indexer layer. I had not thought of that. While that all but guarantees the success of the props, I'm still looking into the link provided and trying to better understand WHERE i need to make some props.conf adjustments. Generally speaking I keep things in one place (on the indexer tier (slave-apps). Gives me one place to look and manage the settings for the incoming data.
I would like to understand more about when i should push to the UF instead, potentially saving those precious indexer tier resources. Seeking to understand more...
CHARSET
applies at input, see http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/propsconf (search for "input time" for a fairly exhaustive list)
Linebreaking happens at parsing ( LINE_BREAKER
, TRUNCATE
).
Line Merging happens at merging ( BREAK_ONLY_BEFORE
, MUST_BREAK_AFTER
, SHOULD_LINEMERGE
).
Timestamping happens at typing ( DATETIME_CONFIG
, TIME_FORMAT
, TIME_PREFIX
, MAX_TIMESTAMP_LOOKAHEAD
)
See http://wiki.splunk.com/Community:HowIndexingWorks
Where those four happen depends on what path the data takes through Splunk. Input usually happens on a UF, while the other three happen where the data is cooked. By default that's the indexer, if you have a heavy forwarder along the route it's usually the first heavy forwarder, and for indexed extractions it's usually the inputting forwarder - even a UF.
Here are the latest props.conf setting at 9.2.1 on the universal forwarder: (json file parsing works g8 with this option)
EVENT_BREAKER_ENABLE = <boolean>
EVENT_BREAKER = <regular expression>
LB_CHUNK_BREAKER = <regular expression>
force_local_processing = <boolean> * new
* Forces a universal forwarder to process all data tagged with this sourcetype
locally before forwarding it to the indexers.
* Data with this sourcetype is processed by the linebreaker,
aggerator, and the regexreplacement processors in addition to the existing
utf8 processor.
* Note that switching this property potentially increases the cpu
and memory consumption of the forwarder.
* Applicable only on a universal forwarder.
* Default: false
great explanation. better than splunk docs 🙂
You can safely have "too many" settings on the UF, it knows what to use and to ignore.
Of course, but I'd rather know what I am doing
Thanks for the tip about searching "input time", it is really helping!
So If I need to apply all the settings you're mentionning, except "CHARSET", I need to do it on the Indexer side?
On the other hand, according to woodcock's answer, TZ attribute is working on the UF, but in the docs (http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/propsconf) there is no "input time" mention.
As far as "applying all the settings", that is generally a poor approach. I would try using the settings you are pretty sure that you need to change from the default and trust the defaults for the rest, until testing proves there is a problem. The VAST majority of the time, very few settings (changes) are used, even in a complex situation.
The thing is, I know I need to change them because it is settings set on an Heavy Forwarder. But one project we have is to convert the Heavy Forwarder into a Universal Forwarder. I am trying to find which attributes should be copied into the UF and which attributes should be copied into the Indexers directly.
Your title says "for" a UF and your question says "on" a UF. I am going to assume that you literally mean "on a UF". Because the UF does not index the data (with the exception of INDEXED EXTRACTIONS
), very little in those files makes any sense to deploy "to" and use "on" the UF. Some that DO include:
props.conf:
TZ, sourcetype, NO_BINARY_CHECK, CHECK_METHOD, priority, and of course INDEXED_EXTRACTIONS (and its associates).
I cannot think of anything in transforms.conf
that takes effect on the UF.
Thanks, I was hoping for an exhaustive list, it is strange that Splunk does not provide one.
What about the following :
CHARSET
DATETIME_CONFIG
TIME_FORMAT
TIME_PREFIX
MAX_TIMESTAMP_LOOKAHEAD
LINE_BREAKER
BREAK_ONLY_BEFORE
MUST_BREAK_AFTER
TRUNCATE
SHOULD_LINEMERGE
?
The problem is that some of this is "it depends", especially when you use INDEXED_EXTRACTIONS
.
The whole props/transforms shebang does take effect for data that is cooked on the UF through INDEXED_EXTRACTIONS
.
Yes, that is why I called it out as an exception.
This should give you information on how data moves from source to Splunk and what all activities are performed and by which node (forwarder/heavy forwarder/indexer)
http://docs.splunk.com/Documentation/Splunk/6.5.2/Deploy/Componentsofadistributedenvironment
http://docs.splunk.com/Documentation/Splunk/6.5.2/Deploy/Datapipeline
This might help: http://wiki.splunk.com/Community:HowIndexingWorks
Hi @martin_mueller, All, Great discussion. Trying to find props stanza rules for UF to enforce line breaking for 100% of records in json files. I need inputs to resolve combined records from json files for S3 usage metrics (228 out of ~550K json files) during ingestion from 1 universal forwarder (UF). I'm hearing that the business users need the records in the json files to be properly ingested as 1 record for each line of json file for report correctness. Currently there are ~22k out of 5.5mil records which are ingested combined in those 228 json files.
One record in the json file is
{"ReportDate":"10-31-2021","Bucket":"0123-5678-9999","Prefix":"processingDate%3D2021-10-28\/errors\/source\/error_type%3Dinter\/venuecd\/","StorageClass":"STANDARD","IsLatest":true,"IsDeleteMarker":false,"SizeGB":0.4184613759,"Count":6744}
Used the following props stanza on the UF using the information in https://docs.splunk.com/Documentation/Splunk/8.2.2/Admin/Propsconf (I also think that Props which could be applied on the UF side should be more clearly displayed):
[source::/path/s3data/*s3usageinfo.json]
EVENT_BREAKER_ENABLE = true
force_local_processing = true
INDEXED_EXTRACTIONS = json
disabled = false
TIMESTAMP_FIELDS = ReportDate
TIME_FORMAT = %m-%d-%Y
Planning to add "timestamp":"1636481800" in the json record to resolve line breaking. Any other suggestions? Thanks.