Solved: What are my options for automatically ingesting da...

thisissplunk · ‎03-28-2018

I'm trying to determine the architecture options for automatically ingesting data into Splunk, i.e I place data in a folder -> run a script -> supply an index name -> supply a sourcetype -> script gets data into new index somehow.

Now, you may be asking why I don't just create a monitoring directory and make a script to dump data there. That's because each set of data will be of a different data/sourcetype and for a new index. Basically, think of taking in random, one-off sets of data that need different indexes all the time for analysis, NOT a log source from one machine constantly being pumped in.

So, with that said I see my options as:

Script the creation of monitoring directories for each new index, as well as the API calls to set inputs.conf stanzas, while scp'ing the data into the monitoring directory.
Use the API to directly send data from anywhere
TCP input???
???

For #2, I don't know if this works for our instance because I've seen it only send to one single index in the past instead of load balancing. Are there any other issues ingesting large amounts of data with the API?

Any other possibilities here?

niketn · ‎03-28-2018

@thisissplunk there are multiple options.

1) Route and Filter Data using transforms.conf:
Contrary to your assumption, with Splunk you can monitor a Splunk folder and have transformations defined on your sourcetype to re-route data for specific index with a new sourcetype. Refer to following documentation:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Advancedsourcetypeoverrides http://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Routeandfilterdatad

PS: You would definitely need to test re-routing in non prod environment first. This should be quite easy to implement for Splunk Admins.

2) Scripted Input to Splunk
Refer to following resources to setup scripted input to Splunk.
https://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ScriptSetup
https://sublimerobots.com/2017/01/simple-splunk-scripted-input-example/

3) HTTP Event Collector
HTTP Event Collector can be configured so that your Application/s can directly communicate with Splunk without having to create a log first or scripted input.
http://docs.splunk.com/Documentation/Splunk/latest/Data/AboutHEC
http://dev.splunk.com/view/event-collector/SP-CAAAE6M

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn · ‎03-28-2018

@thisissplunk there are multiple options.

1) Route and Filter Data using transforms.conf:
Contrary to your assumption, with Splunk you can monitor a Splunk folder and have transformations defined on your sourcetype to re-route data for specific index with a new sourcetype. Refer to following documentation:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Advancedsourcetypeoverrides http://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Routeandfilterdatad

PS: You would definitely need to test re-routing in non prod environment first. This should be quite easy to implement for Splunk Admins.

2) Scripted Input to Splunk
Refer to following resources to setup scripted input to Splunk.
https://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ScriptSetup
https://sublimerobots.com/2017/01/simple-splunk-scripted-input-example/

3) HTTP Event Collector
HTTP Event Collector can be configured so that your Application/s can directly communicate with Splunk without having to create a log first or scripted input.
http://docs.splunk.com/Documentation/Splunk/latest/Data/AboutHEC
http://dev.splunk.com/view/event-collector/SP-CAAAE6M

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

thisissplunk · ‎03-28-2018

Btw, I don't see the REST API listed here. Why is that? Is the HTTP Event Collector a better option? Can you even use the REST API to send data to multiple indexers?

niketn · ‎03-28-2018

@thisissplunk, sorry I missed REST API but that can be implemented as well: https://www.splunk.com/blog/2016/05/11/splunking-continuous-rest-data.html

I would recommend HTTP Event Collector however you might have to reach out implementation requirements as well.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

thisissplunk · ‎03-28-2018

Thank you, I'll review soon.

What are my options for automatically ingesting data into Splunk?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation

What are my options for automatically ingesting data into Splunk?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability