I have been using the receivers/simple endpoint to add events into Splunk, and have run into a major performance degradation that occurs when the size of the raw event text goes above 1kB.
If I submit an event to receivers/simple where the raw event text is exactly 1024 bytes, I get a success response within a fraction of a second. However, if I submit an event where the raw event text is 1025 bytes, it takes slightly over 2 seconds.
The following script can be used to reproduce the problem:
#!/bin/bash
A975=`head -c 975 < /dev/zero | tr '\0' a`
JUNK=$A975
for COUNT in {1..9}
do
echo $COUNT
time curl -k -u admin:changme 'https://localhost:8089/services/receivers/simple?source=receiver%20test&sourcetype=test&host=davids%2Dmacbook%2Dpro&index=main' -d "2013-03-05T10:00:0$COUNT This is summary event $COUNT $JUNK"
JUNK=a$JUNK
done
The relevant bits of the output are:
6
<?xml version="1.0" encoding="UTF-8"?>
<response>
<results>
<result>
<field k='_index'>
<value><text>main</text></value>
</field>
<field k='bytes'>
<value><text>1024</text></value>
</field>
<field k='host'>
<value><text>davids-macbook-pro</text></value>
</field>
<field k='source'>
<value><text>receiver test</text></value>
</field>
<field k='sourcetype'>
<value><text>test</text></value>
</field>
</result>
</results>
</response>
real 0m0.019s
user 0m0.005s
sys 0m0.005s
7
<?xml version="1.0" encoding="UTF-8"?>
<response>
<results>
<result>
<field k='_index'>
<value><text>main</text></value>
</field>
<field k='bytes'>
<value><text>1025</text></value>
</field>
<field k='host'>
<value><text>davids-macbook-pro</text></value>
</field>
<field k='source'>
<value><text>receiver test</text></value>
</field>
<field k='sourcetype'>
<value><text>test</text></value>
</field>
</result>
</results>
</response>
real 0m2.021s
user 0m0.005s
sys 0m0.005s
If you run the whole script you'll see the timings are similarly fast for all raw event lengths up to 1024 bytes and similarly 2 seconds slower for all raw event lengths that are 1025 bytes and longer.
Although the output above came from a Mac laptop running Splunk 4.3.2, I also get exactly the same problem on a Linux server running Splunk 5.0.1, so the problem seems to affect multiple platforms and versions.
Does anyone know why this slowdown is happening? Am I running into a limit that can be changed in a Splunk config file? I've tried grepping for "1024" and "1kb" in all the .conf files and haven't found anything that looks likely, so any expert guidance would be much appreciated.
After a bit more debugging I've found the answer myself.
The 1024 byte limit is actually in cURL, not Splunk. (grep for TINY_INITIAL_POST_SIZE in the cURL source code if you want to see it.)
When cURL is told to send a body bigger than 1kB, it adds the extra header "Expect: 100-continue" unless it's been specifically told otherwise. So the workaround to the problem is to specify the header "Expect:" to cURL, which can be done on the command line by adding the -H 'Expect:' option. Appending this to the curl command line in the script in the original question makes all 9 calls very fast, and this is fine as a workaround.
However, this still implies that there's a 2 second delay inside splunkd when a POST is received containing the header "Expect: 100-continue". It would be nice if splunkd didn't sleep so long in this situation. There's more discussion of the same problem in a completely different application at http://www.songkick.com/devblog/2012/11/27/a-second-here-a-second-there/.
After a bit more debugging I've found the answer myself.
The 1024 byte limit is actually in cURL, not Splunk. (grep for TINY_INITIAL_POST_SIZE in the cURL source code if you want to see it.)
When cURL is told to send a body bigger than 1kB, it adds the extra header "Expect: 100-continue" unless it's been specifically told otherwise. So the workaround to the problem is to specify the header "Expect:" to cURL, which can be done on the command line by adding the -H 'Expect:' option. Appending this to the curl command line in the script in the original question makes all 9 calls very fast, and this is fine as a workaround.
However, this still implies that there's a 2 second delay inside splunkd when a POST is received containing the header "Expect: 100-continue". It would be nice if splunkd didn't sleep so long in this situation. There's more discussion of the same problem in a completely different application at http://www.songkick.com/devblog/2012/11/27/a-second-here-a-second-there/.