We have recently moved some of our applications to the AWS cloud and now I am being tasked with: "install and configure Splunk fwder on the necessary instances ( i.e. anything writing a log)". I have found posts to allow for creating a Splunk cluster on AWS, but I believe that this is more than I need. We already have Splunk licensed on a local machine and I am trying to get data, from the logs generated by the apps running on AWS, aggregated with what is already being monitored. Thanks for any help.
Thanks, Bill... as it turns out, I was wrong about a local instance of Splunk. Instead, our indexers are on "splunkcloud.com" and the solution was rather simple. I was able to install the universal forwarder in my EC2 instance and add an outputs.conf file (with the preconfigured "server=" setting pointing to our indexers) to the "/local" folder under $SPLUNK_HOME. Worked like a charm (with a few other minor tweaks, of course).
Thanks to all for the suggestions; I know that they will help in the future as we grow our cloud presence.
Using the universal forwarder which requires a login is not a very scalable in a Configuration Management solution unless you deal with the file locally or use an internal repository.
Thanks, Bill... as it turns out, I was wrong about a local instance of Splunk. Instead, our indexers are on "splunkcloud.com" and the solution was rather simple. I was able to install the universal forwarder in my EC2 instance and add an outputs.conf file (with the preconfigured "server=" setting pointing to our indexers) to the "/local" folder under $SPLUNK_HOME. Worked like a charm (with a few other minor tweaks, of course).
Thanks to all for the suggestions; I know that they will help in the future as we grow our cloud presence.
I'd suggest at least considering setting up an indexer in EC2. We don't know how much data you're looking to forward, but sending data out of EC2 can get expensive quickly as your indexing requirements grow.
My suggestion would be configuring an indexer in EC2, hopefully in the same AZ as where your data is being generated to avoid inter-AZ data transmission cost. Then, you can simply point your existing search head at that indexer as part of your distributed search configuration. (This will require some tweaking of your security group to allow connectivity the proper ports from the appropriate sources, etc.)
This is also, in my opinion, the most straightforward solution. You won't have to create any VPNs or manage SSL certs. All you'll have is another indexer in the environment where the data is being generated.
What are the inbound and outbound rules that need to be set for the EC2 (with the forwarder) and the splunk server/indexer (to receive data from forwarder)?
Installing a forwarder on an EC2 instance is really no different than installing a forwarder anywhere else. EC2 instances are mostly "just computers" and they run the same OSes as other things.
Your concerns are probably more around issues like:
So all of these are fairly broad architectural issues. Some solutions you may consider and judge the relative merits of for yourself:
Solution 1 - use AWS's built-in VPN support to establish an IPSec tunnel from your VPC to your data center.
In this scenario, you build a tunnel from your VPC in AWS to your data center. Amazon supplies a tunnel endpoint concentrator at one end, you supply the other (like a Cisco ASA or a Linux box running IPSec software). You establish the tunnel such that IP routing exists between your AWS IP space and your internal network. From there, you do what you always have in terms of indexers and deployment server.
Pros:
Cons:
Solution 2 - you stand up some Splunk services in your DMZ and configure them to act as reverse proxies into your existing Splunk infrastructure
In this scenario, you take your existing DMZ and put up (say) a deployment server and a couple of heavy forwarders. These are more-or-less exposed to the Internet (maybe you can firewall filter down to just known AWS IP spaces), and act as reverse proxies for getting data into Splunk. The AWS forwarders send data to the DMZ heavies, who parse it and send it onward to your indexers. The DMZ deployment server provides configuration information to your AWS forwarders and you can manage them centrally.
Pros:
Cons:
There are other ways besides these two for solving this problem, but these are two of the broadest brush strokes that you could consider. You'll notice that I put "may come with added security risk" as a con for both of these solutions. No matter what you do in this case you are adding some risk. Be aware of that and understand what it is and plan to mitigate it appropriately.
Thanks, dwaddle, for the quick response. Both solutions are definitely worth considering; the "pros" for both are intriguing, but I can definitely fall into some of the "cons" as well.