We are pulling data like Red Hat logs, Apigee, Ansible etc. from AWS through fluentd plugin which is forwarding data to our Heavy Forwarder in AWS, and then from that, the HF to another HF in a DMZ to another HF outside of DMZ.
The data is passing through and getting indexed, so the firewall rules and ports are established properly. However, when trying to transform the data so that we can split it into numerous sourcetypes, it will not work. It still applies the original sourcetype applied from fluentd plugin.
In the fluentd plugin, we are defining index name, sourcetype, and the default format is JSON. We are trying to override this index and sourcetype at the destination for differentiating types of data with different sourcetypes by defining inputs.conf, props.conf, transforms.conf. It is not applying the values what we define here at the destination. It is only taking the values that the source is defining in the fluentd plugin config file.
So the question is, can we add a props and transforms config in fluentd plugin in AWS to differentiate the logs with sourcetypes? Can anyone suggest a possible solution for this kind of problem?
FLuentd plugin is ----k24d/fluent-plugin-splunkapi
We are using Splunk 6.2.2 in all Indexers, Forwarders etc
Here are the configs that we defined at the destination.
Please help us.
inputs.conf
[splunktcp://1600]
connection_host = ip
sourcetype = journald
index = aws_fluentd_index
props.conf
[source::poc.aws.system.journald]
KV_MODE = json
TIME_PREFIX=^
TIME_FORMAT=%Y-%m-%d %T %z
SHOULD_LINEMERGE=false
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK = 1
pulldown_type = 1
[source::poc.aws.system.journald]
TRANSFORMS-override=override_ST_journald,override_IDX_journald
transforms.conf
[override_ST_journald]
SOURCE_KEY=_raw
REGEX=.*
FORMAT = sourcetype::journald
DEST_KEY = MetaData:Sourcetype
[override_IDX_journald]
SOURCE_KEY=_raw
REGEX=.*
FORMAT = aws_fluentd_index
DEST_KEY = _MetaData:Index
Hi vya998,
Thanks for using Fluentd!
The first bit is that the Splunk API plugin that you referenced is deprecated, and you should switch to sending messages over TCP or through the Splunk HTTP Event Collector. Additionally, I see that your configuration for translating and parsing data is being done on the Splunk indexer side. I would recommend translating those configurations over to Fluentd to distribute that compute layer to the endpoints so Splunk can focus on search. Fluentd has the ability to do most of the common translation on the node side including nginx, apache2, syslog [RFC 3624 and 5424], etc.
Additionally, if you are interested in the Fluentd Enterprise Splunk TCP and HTTP Event Collector plugin and help in optimizing parsing and transformation logic you can email me at A at TreasureData dot com. More info for https://fluentd.treasuredata.com
Thanks,
Anurag
Apart from location of the props and transforms (which should be in HF in your case), does the source
of the data is really poc.aws.system.journald
?
yes the source is same as mentioned
Can you confirm the location of props and transforms.conf, is it in Heavy forwarder OR Indexers?
Destination Heavy forwarder and also the indexers
You have 3 HFs before your indexer? Which one do you have props and transforms on? If you're wanting to apply props and transforms on the middle or last one, I think you need to override the route attribute in tour splunktcp stanza in inputs.conf to send the data that was already parsed by the previous HF back through the typingQueue and pipeline instead of skipping straight through to the indexingQueue. I haven't done this personally so YMMV. I'm also assuming of course that your source matches the source set by the plugin. For some reference links: http://docs.splunk.com/Documentation/Splunk/6.4.0/admin/Inputsconf and https://wiki.splunk.com/Community:HowIndexingWorks
Which HF are you suggesting to override the route attribute in? The middle one?
Whichever one you want to do the re-parsing on. It's tricky as you may impact other data being forwarded over the same port and re-apply props & transforms to data you didn't expect to, but if the middle one is dedicated to gathering this data, then definitely. (of course the easier way would be if the origin HF could just set the index and sourcetype appropriately at input time, but I'm not familiar with the plugin you're using).