Getting Data In: Splunk Cloud Data Management Pipeline Builders (Edge & Ingest Processor) - Wed 9/25/24

adepp · ‎10-02-2024

Here are a few questions from the session (get the full Q&A deck and live recording in the #office-hours Slack channel):

Q1: Splunk Education - will there be a training for Edge/Ingest Processors added in the catalog soon? Having the training environment to try out the use cases really helps.

https://www.splunk.com/en_us/training/course-catalog.html?search=edge+processor

Q2: How does Edge Processor handle syslog data that has LF characters seen in the middle of a TCP packet (hint: my data is being split before it ever hits the pipeline or sourcetype)?

Edge Processor uses the frame length in the syslog protocol for splitting events, so LF is just another character and doesn't affect the splitting.

Q3: Using Edge Processor, we are looking for a capability to be able to ingest to a syslog endpoint (SIEM) or a third party syslog.

Unfortunately Edge Processing only supports data routing to a few locations at the moment:
- Splunk Enterprise stacks
- Splunk Cloud stacks
- AWS S3
Future roadmap items include locations like Splunk Observability as well as additional 3rd party object stores.

Q4: How does data management know what data we have or how the UF sends data to DMX?

Both EP and IP have a "sourcetypes seen" view to help understand. EP preview isn't live data, but as Poornima showed, in IP you can capture and preview using live data

Other Questions/Topics (check the #office-hours Slack channel for responses):

Edge Processor & Ingest Processor functionality
Ingest Processor to Splunk Observability Cloud demo
I am very new to Edge and Ingest Processor. Could you please walk-through with an example from a beginner's perspective?
Explain the flow of data across multiple pipelines. Can I chain pipelines? How do I define the flow?
How can I start routing and filtering with the Ingest Processor after it has been enabled in the Cloud Stack?
How can I split different data syslog sources that can only send via port 514? Do you have example SPL2 for this sort of operation?
Can we use all those products at the same time?
Are there issues with using a VIP on a load balancer in front of multiple EP nodes? Not with HEC
Any timeline of Edge Processor availability for Splunk Enterprise customers?
Is it possible to use the Edge Processor to retrieve data from an API endpoint? If not, is there potential to add this functionality in the future? For example, certain threat intelligence vendors offer data retrieval via API but lack the ability to push directly to a HEC endpoint.
Do we have way to process and route the public cloud (AWS, Azure, GCP) data directly on cloud EP and send to destination? For example, we have Add-on collecting data on Splunk IDM or on SHC in Victoria and Cloud data being sent via HEC to Splunk Cloud directly.
Is useACK now supported for EP?
If Edge Processors were implemented on top of the HWF's to simply route data, would it still be the raw data in JSON format for HEC ingestion into the S3 destination ?
Is it necessary to install an edge node on all Universal Forwarders (UFs), or can we set up a single server to act as the edge node and have all UF data sent to that server?

Getting Data In: Splunk Cloud Data Management Pipeline Builders (Edge & Ingest Processor) - Wed 9/25/24

Getting Data In: Splunk Cloud Data Management Pipeline Builders (Edge & Ingest Processor) - Wed 9/25/24

Getting Data In: Splunk Cloud Data Management Pipeline Builders (Edge & Ingest Processor) - Wed 9/25/24

Getting Data In

Observability

Past Office Hours

Platform

Getting Data In: Splunk Cloud Data Management Pipeline Builders (Edge & Ingest Processor) - Wed 9/25/24

Join the Conversation

Getting Data In: Splunk Cloud Data Management Pipeline Builders (Edge & Ingest Processor) - Wed 9/25/24

Getting Data In: Splunk Cloud Data Management Pipeline Builders (Edge & Ingest Processor) - Wed 9/25/24

Getting Data In

Observability

Past Office Hours

Platform