thomastaylor,
Lets break this down a bit...
HTTP Event Collector:
A Heavy Forwarder is a great option here. You can manage the token and receive HEC inputs on the HWF without the need of the main Splunk install to do anything. As the data is JSON, you'll also get your field extracts "for free" from autokv.
Transforming data:
Yes you can use a Heavy Forwarder for this. I must caution that there are a number of pitfalls that come with using a HWF to "pre-parse" data before it hits the indexers.
Cooked data is larger on the network than uncooked data: https://www.splunk.com/blog/2016/12/12/universal-or-heavy-that-is-the-question.html - Some have theorized that unless you're doing a massive amount of Index time operations, the load on the indexers is actually higher CPU wise too (Still an argument in the community so take this with a grain of salt).
Heavy Forwarders tend to cause data in-balance on Indexers (They get sticky to which indexer they send to, due to not having a break in incoming traffic. A common problem for syslog boxes that use a HWF).
The Indexers are not given a second chance to parse the data - This means if your main Splunk install needs to do sourcetyping, index renaming or host renaming, it will be unable to (Well there are some special things you can do to cheat here, but it's a bad idea)
Creating field extracts:
You are unable to create "search time" field extracts with a Heavy Forwarder. The vast majority of TAs you'll find on Splunkbase are search time. Additionally, creating "index time" field extracts comes with a whole list of caveats (NOTE THE CAUTION WARNING: http://docs.splunk.com/Documentation/Splunk/7.1.1/Data/Configureindex-timefieldextraction). While possible, you're opening yourself up to a massive list of potential issues. To name a few:
Greater storage requirements (index time fields are stored in the TSIDX files, uncompressed)
Lack of flexibility (Once a field is written, it's "burnt" into the index)
Potentially extreme CPU overhead at the HWF level
Also, no the HWF will not let you use the regex tool - that's for search time field extracts. You'd have to have a dev search head / indexer for it and lift the extracts AND convert them to index time. DO NOT RECOMMEND.
TLDR:
For HEC, I think it's a great use case for you. For everything else, I'd advise against it. I'd recommend attempting to fix the relationship with whomever owns your Splunk install. You're setting your team and the Splunk owners up for potential issues down the road (and a bunch of up-front work for yourself as nothing on Splunk base will be plug and play).
... View more