Solved: Re: Why does the receivers/simple REST endpoint ma...

dmr195 · ‎03-05-2013

I have been using the receivers/simple endpoint to add events into Splunk, and have run into a major performance degradation that occurs when the size of the raw event text goes above 1kB.

If I submit an event to receivers/simple where the raw event text is exactly 1024 bytes, I get a success response within a fraction of a second. However, if I submit an event where the raw event text is 1025 bytes, it takes slightly over 2 seconds.

The following script can be used to reproduce the problem:

#!/bin/bash

A975=`head -c 975 < /dev/zero | tr '\0' a`
JUNK=$A975
    
for COUNT in {1..9}
do
   echo $COUNT
   time curl -k -u admin:changme 'https://localhost:8089/services/receivers/simple?source=receiver%20test&sourcetype=test&host=davids%2Dmacbook%2Dpro&index=main' -d "2013-03-05T10:00:0$COUNT This is summary event $COUNT $JUNK"
   JUNK=a$JUNK
done

The relevant bits of the output are:

6
<?xml version="1.0" encoding="UTF-8"?>
<response>
        <results>
                <result>  
                        <field k='_index'>
                                <value><text>main</text></value>
                        </field>
                        <field k='bytes'>
                                <value><text>1024</text></value>
                        </field>
                        <field k='host'>
                                <value><text>davids-macbook-pro</text></value>
                        </field>
                        <field k='source'>
                                <value><text>receiver test</text></value>
                        </field>
                        <field k='sourcetype'>
                                <value><text>test</text></value>
                        </field>
                </result>
        </results>
</response>
        
real    0m0.019s
user    0m0.005s
sys     0m0.005s
7
<?xml version="1.0" encoding="UTF-8"?>
<response>        
        <results>
                <result>  
                        <field k='_index'>
                                <value><text>main</text></value>
                        </field>
                        <field k='bytes'>
                                <value><text>1025</text></value>
                        </field>
                        <field k='host'>
                                <value><text>davids-macbook-pro</text></value>
                        </field>
                        <field k='source'>
                                <value><text>receiver test</text></value>
                        </field>
                        <field k='sourcetype'>
                                <value><text>test</text></value>
                        </field>
                </result>
        </results>
</response>

real    0m2.021s
user    0m0.005s
sys     0m0.005s

If you run the whole script you'll see the timings are similarly fast for all raw event lengths up to 1024 bytes and similarly 2 seconds slower for all raw event lengths that are 1025 bytes and longer.

Although the output above came from a Mac laptop running Splunk 4.3.2, I also get exactly the same problem on a Linux server running Splunk 5.0.1, so the problem seems to affect multiple platforms and versions.

Does anyone know why this slowdown is happening? Am I running into a limit that can be changed in a Splunk config file? I've tried grepping for "1024" and "1kb" in all the .conf files and haven't found anything that looks likely, so any expert guidance would be much appreciated.

dmr195 · ‎03-05-2013

After a bit more debugging I've found the answer myself.

The 1024 byte limit is actually in cURL, not Splunk. (grep for TINY_INITIAL_POST_SIZE in the cURL source code if you want to see it.)

When cURL is told to send a body bigger than 1kB, it adds the extra header "Expect: 100-continue" unless it's been specifically told otherwise. So the workaround to the problem is to specify the header "Expect:" to cURL, which can be done on the command line by adding the -H 'Expect:' option. Appending this to the curl command line in the script in the original question makes all 9 calls very fast, and this is fine as a workaround.

However, this still implies that there's a 2 second delay inside splunkd when a POST is received containing the header "Expect: 100-continue". It would be nice if splunkd didn't sleep so long in this situation. There's more discussion of the same problem in a completely different application at http://www.songkick.com/devblog/2012/11/27/a-second-here-a-second-there/.

View solution in original post

dmr195 · ‎03-05-2013

After a bit more debugging I've found the answer myself.

The 1024 byte limit is actually in cURL, not Splunk. (grep for TINY_INITIAL_POST_SIZE in the cURL source code if you want to see it.)

When cURL is told to send a body bigger than 1kB, it adds the extra header "Expect: 100-continue" unless it's been specifically told otherwise. So the workaround to the problem is to specify the header "Expect:" to cURL, which can be done on the command line by adding the -H 'Expect:' option. Appending this to the curl command line in the script in the original question makes all 9 calls very fast, and this is fine as a workaround.

However, this still implies that there's a 2 second delay inside splunkd when a POST is received containing the header "Expect: 100-continue". It would be nice if splunkd didn't sleep so long in this situation. There's more discussion of the same problem in a completely different application at http://www.songkick.com/devblog/2012/11/27/a-second-here-a-second-there/.

Why does the receivers/simple REST endpoint massively slow down for events larger than 1kB?

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?