Getting Data In

AWS logging to our Splunk

splunklearner
Communicator

AWS logs to Splunk

We need to onboard AWS cloud watch logs (from Kinesis) to our Splunk. We have all our Splunk instances in AWS cloud. Our architecture is multi site cluster with 3 sites.. 2 indexers in each site, 1 sh in each site, 1 deployment server, 2 CMs and 1 Deployer and 1 HF. everything is configured from AWS end and they are asking to create HEC endpoint in our Splunk in order to receive logs. Here my doubt is where and how I need to configure HEC token in my clustering environment? What details I need to mention there? Please help

Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @splunklearner 

You need to send your Amazon Kinesis Firehose data to an Elastic Load Balancer (ELB) with sticky sessions enabled and cookie expiration disabled.

Kinesis uses Indexer Acknowledgement so its important that the LB is configured correctly as the sticky session setting is required in order that Kinesis reaches the correct HF/Indexer to check the acknowledgement.

Regarding the endpoint/service behind the ELB - This can be either HF or your indexer cluster, depending on your configuration.

You should also install the Splunk Add-on for Amazon Web Services (AWS) which has the appropriate field extractions etc *if you are sending AWS data*. If you are sending your own application data then this may not be required, this depends on the processing done within Kinesis.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

isoutamo
SplunkTrust
SplunkTrust

You should do it exactly this way. Remember sticky bit on LB side to forward index ack questions into the correct backend.

Even it’s possible to add hec and tokens to indexers and HF I always prefer to use separate HFs behind LB. The reason for that is adding and modifying tokens and other configurations. It’s quite often required a reboot for those nodes. This is much easier and faster operation on HF than indexers. Also risk to duplicate or lost some events are smaller.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @splunklearner ,

if you need to configure an input from AWS to Splunk, you have to install the Splunk Add-on for Amazon Web Services (AWS) ( https://splunkbase.splunk.com/app/1876 )on the Heavy Forwarder.

Ciao.

Giuseppe

0 Karma

splunklearner
Communicator

@gcusello Cant we receive the logs directly through HEC token without installing add-on

0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@splunklearner 

Configure Amazon Kinesis Firehose to send data to the Splunk platform - Splunk Documentation

https://docs.splunk.com/Documentation/AddOns/released/Firehose/ConfigureHECdistributed 

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @splunklearner ,

it's always better to use the Add-On.

And remember to install the add-on both on HF and SHs.

Ciao.

Giuseppe

splunklearner
Communicator

@gcusello but why can't we use HEC token here? Please help me with disadvantages so that I can discuss with my team

0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@splunklearner 

If you want a pull model there https://splunkbase.splunk.com/app/1876 For a push model, I believe HEC is the recommended approach

Best Practices for Splunk HTTP Event Collector:


Always configure HEC to use HTTPS to ensure data confidentiality during transmission. Enable SSL/TLS encryption and leverage certificate-based authentication to authenticate the sender and receiver.

Consider the expected data volume and plan your HEC deployment accordingly. Distribute the load by deploying multiple HEC instances and using load balancers to ensure high availability and optimal performance.

Implement proper input validation and filtering mechanisms to prevent unauthorized or malicious data from entering your Splunk environment. Use whitelists, blacklists, and regex patterns to define data validation rules.

Regularly monitor the HEC pipeline to ensure data ingestion is successful. Implement proper error handling mechanisms and configure alerts to notify administrators in case of failures or issues.

Some common challenges associated with Splunk HEC:

While HEC is designed to handle high volumes of data, organisations with extremely large-scale deployments may face challenges related to scalability and performance. It's important to carefully plan the HEC deployment, consider load balancing mechanisms, and optimize configurations to ensure optimal performance.

As HEC relies on network connectivity for data ingestion, any issues with network availability or reliability can impact the ingestion process. Organizations should have robust network infrastructure and redundancy measures in place to minimize downtime and ensure uninterrupted data flow.

While HEC provides authentication mechanisms and supports SSL/TLS encryption, configuring and managing authentication and security settings can be complex. Organizations need to properly configure user access controls, certificates, and encryption protocols to ensure secure data transmission and prevent unauthorized access.

HEC allows data ingestion from various sources, making it crucial to implement proper input validation and filtering mechanisms. Ensuring the integrity and quality of the ingested data requires defining validation rules, whitelists, blacklists, and regular expressions to filter out unwanted or malicious data.

Monitoring the HEC pipeline and troubleshooting any issues that may arise can be challenging. Organizations should establish proper monitoring processes to track the health and performance of HEC instances, implement logging and alerting mechanisms, and have troubleshooting strategies in place to quickly identify and resolve any problems.

Integrating HEC with different data sources, applications, and systems can pose compatibility challenges. It's important to ensure that the data sources are compatible with HEC and have the necessary configurations in place for seamless integration.

Configuring and maintaining HEC instances and associated settings require technical expertise and ongoing effort. Organizations need to keep HEC configurations up to date, apply patches and updates, and regularly review and optimize settings to ensure optimal performance and security.

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @splunklearner ,

the HEC is the channel to receive data, but the inputs and the parsing and normalization rules are in the add-on.

Infact the link you shared is a description of the add-on configuration process: it isn't sufficient to configure the token to send data, you need also to configure the add-on to define the inputs to enable.

Ciao.

Giuseppe

0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@splunklearner 

Where to place HEC

Scale HTTP Event Collector with distributed deployments - Splunk Documentation

Refer the below links:

https://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector 

https://docs.splunk.com/Documentation/AddOns/released/Firehose/ConfigureHECdistributed 

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...