Getting Data In

Is there any splunk insertion limitation?

sccheah82
Explorer

Our customer is running a script that is performing around 80k times of individual data insertion into Splunk. 

We are in the process of advising our customer to do bulk insertion by compiling the data from 80k of foreach loop then call one single Splunk insertion command.

We want to understand if there is any Splunk limitation in Splunk insertion operation:

    a. What is the limit of data size for single Splunk insertion upload?

    b. What is the maximum of number of insertion per user a day which causes the connection to be terminated?

    c. What causes the connection to be terminated (Sometimes success while sometimes failed) when inserting the data?

 

Labels (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @sccheah82,

there isn't any limit in data ingestion if you have more events.

there's only a default limit for each single event to 10,000 chars but it's configurable in props.conf modifying the TRUNCATE option (https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Propsconf).

Splunk is vary happy if you index many logs because you pay for the daily indexed logs!

there isi't any limit for users bacause there isn't an ingestion per user but only ingestion.

a connection could be interrupted when there's a bandwidth problem or the Indexers aren't able to index all ingested logs.

This problem can be solved in many ways: improving storage performances, adding more resources (CPUs), adding more Indexers.

You can analyze this issue in the Monitoring Console, analyzing if in the Indexeing phase there are some queues.

In other words the limit isn't Splunk but only the resources (License or hardware or network) that you have available.

Ciao.

Giuseppe

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

@gcusello's answer is generally right but there is more to this than that 🙂

As always there are two factors you have to weigh - performance and reliability (also in splunk's case - data distribution).

If you're talking about "insertion", I'd assume you want to push events via HEC (with different methods you can hit different issues). With HEC you can use separate HTTP POSTs for each event or can combine multiple events into a single HTTP POST request. And here's where the fun begins 🙂

With a "one event per request" you have more flexibility and can easily select and replay each single event in case of some error. Furthermore, if you're connecting through a load-balancer, each request can be routed to a different backend.

If - on the other hand - you're sending in batches - the performance will typically be higher but in case of problems you might have problems identifying problematic events (especially if you don't use acknowledgements), you might need to retransmit whole batch if you have network problems and of course whole request gets routed to a single server.

So there are pros and cons for each approach. Typically high-volume sources (like SC4S) will most probably send data in reasonably-sized batches (like 100 or 1000 events per request).

Also remember that there is a limit for each separate HEC-ingested event (it's 5MB by default) and there are some limits on http input parameters (but they can be tweaked).

https://docs.splunk.com/Documentation/Splunk/9.0.3/Admin/Limitsconf#.5Bhttp_input.5D

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sccheah82,

there isn't any limit in data ingestion if you have more events.

there's only a default limit for each single event to 10,000 chars but it's configurable in props.conf modifying the TRUNCATE option (https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Propsconf).

Splunk is vary happy if you index many logs because you pay for the daily indexed logs!

there isi't any limit for users bacause there isn't an ingestion per user but only ingestion.

a connection could be interrupted when there's a bandwidth problem or the Indexers aren't able to index all ingested logs.

This problem can be solved in many ways: improving storage performances, adding more resources (CPUs), adding more Indexers.

You can analyze this issue in the Monitoring Console, analyzing if in the Indexeing phase there are some queues.

In other words the limit isn't Splunk but only the resources (License or hardware or network) that you have available.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sccheah82,

good for you, see next time!

Ciao and happy splunking.

Giuseppe

P.S.: Karma Points are appreciated 😉

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...