Getting Data In

Why are REST API receivers/simple breaks input unexpectedly?

yuanliu
SplunkTrust
SplunkTrust

I have a script that sends effectively yum outputs to receivers/simple.  props.conf says

[yumstuff]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Miscellaneous
pulldown_type = 1

I expect the each post to be one event.  But some posts get broken into multiple events for unknown reasons.  My guess is that those posts are longer, although I couldn't find any applicable limit in limits.conf.  The broken ones are not all that long to start.  I examined one that was broken into three "events ".  Combined, they have 18543 chars, 271 lines.  The closest attribute in limits.conf I can find is maxchars, but that's for [kv] only, and the limit is already high:

[kv]
indexed_kv_limit = 1000
maxchars = 40960

The way it is broken also confuses me.  My post begins with a timestamp, followed by some bookkeeping kv pairs, then yum output.  If this breakage is caused by limits, I would expect the event containing the first part to be the biggest, to the extent it exceeds that limit.  But in general, the "event" corresponding to the end of the post is the biggest; even stranger, the middle "event" generally is extremely small containing only one line.  In the post I examined, for example, the first "event" contained 6710 chars, the second, 71 chars, the last, 11762 chars.  The breaking points are not special, either.  For example,

2022-02-09T19:51:28+00:00 ...

...

---> Package iwl6000g2b-firmware.noarch 0:18.168.6.1-79.el7 will be updated

---> Package iwl6000g2b-firmware.noarch 0:18.168.6.1-80.el7_9 will be an update

<break>

---> Package iwl6050-firmware.noarch 0:41.28.5.1-79.el7 will be updated

<break>
---> Package iwl6050-firmware.noarch 0:41.28.5.1-80.el7_9 will be an update
---> Package iwl7260-firmware.noarch 0:25.30.13.0-79.el7 will be updated
...

 

Where should I look?

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

That's interesing because the receivers/simple should not break the event at all. As I understand the docs you're supposed to put your whole single event as a request body (that's why it's better to use normal HEC endpoint, apart from the possible additional fields). And as I checked, I had no problem, and splunk wouldn't break it on any line breaks.

Are you sure you're not sending your text in chunks?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Are you sure you're not sending your text in chunks?

I do not have a mechanism to break text in chunks.  The shell script essentially feeds yum output to REST API.

report() {
  stage=$1
  yum_status=$2
  source=$3
  <stuff...>

  curl -H "Authorization: Bearer $auth_token" \
    -d "$(date --rfc-3339=seconds|tr ' ' T) stage=$stage user=$user status=$status yum_status=$yum_status $auto" \
    -d "$(<$source)" \
    -X POST "$api_url?sourcetype=os_patch&source=$source&host=$HOSTNAME" \
    2>/dev/null >&2
} end report

systemPatch () {
  <stuff...>
  yum <...> | tee /var/tmp/${FUNCNAME[0]}.out
  report ${FUNCNAME[0]} $os_patch_status /var/tmp/${FUNCNAME[0]}.out
} # end systemPatch

The actual second -d feed can be slightly more nuanced than "$(<$source)".  For example, it can sometimes be composed of multiple segments extracted from $source.  Despite variations in -d composition, it is always one string with multiple lines.  Because the same composition is found in fragmented events and integral events, I doubt if those small variations play any role.

Meanwhile, you point to a possible direction for research.  Thanks!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Hmm. I tested by posting over 100kB of events (generated in bash with seq loops ;-)) and nothing ever broke into multiple events. Regardless of whether I let bash insert line breaks or I put explicit \r or \n - the events never broke into multiple ones. Even with consecutive line breaks.

That's why I'm asking because the behaviour is indeed strange.

BTW, if you're calling curl with multiple -d options you'll get your data sent as url-encoded, you know that?

yuanliu
SplunkTrust
SplunkTrust

Thanks for help with testing!  This can rule out length as a sole determinant.  I haven't thought of controlled experiment like that, but I should.  Maybe my network buffer is causing hiccups at random points?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, that's puzzling indeed since single curl call should give you a single HTTP request (unless you're following redirections but that's another story). If your request were too large, you should have been given an error by the server, curl shouldn't "split" the request in two. So that's completely puzzling. Maybe some debug at splunkd side... but that seems a bit like overkill.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...