Getting Data In

Is there any splunk insertion limitation?

sccheah82
Explorer

Our customer is running a script that is performing around 80k times of individual data insertion into Splunk. 

We are in the process of advising our customer to do bulk insertion by compiling the data from 80k of foreach loop then call one single Splunk insertion command.

We want to understand if there is any Splunk limitation in Splunk insertion operation:

    a. What is the limit of data size for single Splunk insertion upload?

    b. What is the maximum of number of insertion per user a day which causes the connection to be terminated?

    c. What causes the connection to be terminated (Sometimes success while sometimes failed) when inserting the data?

 

Labels (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @sccheah82,

there isn't any limit in data ingestion if you have more events.

there's only a default limit for each single event to 10,000 chars but it's configurable in props.conf modifying the TRUNCATE option (https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Propsconf).

Splunk is vary happy if you index many logs because you pay for the daily indexed logs!

there isi't any limit for users bacause there isn't an ingestion per user but only ingestion.

a connection could be interrupted when there's a bandwidth problem or the Indexers aren't able to index all ingested logs.

This problem can be solved in many ways: improving storage performances, adding more resources (CPUs), adding more Indexers.

You can analyze this issue in the Monitoring Console, analyzing if in the Indexeing phase there are some queues.

In other words the limit isn't Splunk but only the resources (License or hardware or network) that you have available.

Ciao.

Giuseppe

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

@gcusello's answer is generally right but there is more to this than that 🙂

As always there are two factors you have to weigh - performance and reliability (also in splunk's case - data distribution).

If you're talking about "insertion", I'd assume you want to push events via HEC (with different methods you can hit different issues). With HEC you can use separate HTTP POSTs for each event or can combine multiple events into a single HTTP POST request. And here's where the fun begins 🙂

With a "one event per request" you have more flexibility and can easily select and replay each single event in case of some error. Furthermore, if you're connecting through a load-balancer, each request can be routed to a different backend.

If - on the other hand - you're sending in batches - the performance will typically be higher but in case of problems you might have problems identifying problematic events (especially if you don't use acknowledgements), you might need to retransmit whole batch if you have network problems and of course whole request gets routed to a single server.

So there are pros and cons for each approach. Typically high-volume sources (like SC4S) will most probably send data in reasonably-sized batches (like 100 or 1000 events per request).

Also remember that there is a limit for each separate HEC-ingested event (it's 5MB by default) and there are some limits on http input parameters (but they can be tweaked).

https://docs.splunk.com/Documentation/Splunk/9.0.3/Admin/Limitsconf#.5Bhttp_input.5D

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sccheah82,

there isn't any limit in data ingestion if you have more events.

there's only a default limit for each single event to 10,000 chars but it's configurable in props.conf modifying the TRUNCATE option (https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Propsconf).

Splunk is vary happy if you index many logs because you pay for the daily indexed logs!

there isi't any limit for users bacause there isn't an ingestion per user but only ingestion.

a connection could be interrupted when there's a bandwidth problem or the Indexers aren't able to index all ingested logs.

This problem can be solved in many ways: improving storage performances, adding more resources (CPUs), adding more Indexers.

You can analyze this issue in the Monitoring Console, analyzing if in the Indexeing phase there are some queues.

In other words the limit isn't Splunk but only the resources (License or hardware or network) that you have available.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sccheah82,

good for you, see next time!

Ciao and happy splunking.

Giuseppe

P.S.: Karma Points are appreciated 😉

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...