Getting Data In

Why are REST API receivers/simple breaks input unexpectedly?

yuanliu
SplunkTrust
SplunkTrust

I have a script that sends effectively yum outputs to receivers/simple.  props.conf says

[yumstuff]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Miscellaneous
pulldown_type = 1

I expect the each post to be one event.  But some posts get broken into multiple events for unknown reasons.  My guess is that those posts are longer, although I couldn't find any applicable limit in limits.conf.  The broken ones are not all that long to start.  I examined one that was broken into three "events ".  Combined, they have 18543 chars, 271 lines.  The closest attribute in limits.conf I can find is maxchars, but that's for [kv] only, and the limit is already high:

[kv]
indexed_kv_limit = 1000
maxchars = 40960

The way it is broken also confuses me.  My post begins with a timestamp, followed by some bookkeeping kv pairs, then yum output.  If this breakage is caused by limits, I would expect the event containing the first part to be the biggest, to the extent it exceeds that limit.  But in general, the "event" corresponding to the end of the post is the biggest; even stranger, the middle "event" generally is extremely small containing only one line.  In the post I examined, for example, the first "event" contained 6710 chars, the second, 71 chars, the last, 11762 chars.  The breaking points are not special, either.  For example,

2022-02-09T19:51:28+00:00 ...

...

---> Package iwl6000g2b-firmware.noarch 0:18.168.6.1-79.el7 will be updated

---> Package iwl6000g2b-firmware.noarch 0:18.168.6.1-80.el7_9 will be an update

<break>

---> Package iwl6050-firmware.noarch 0:41.28.5.1-79.el7 will be updated

<break>
---> Package iwl6050-firmware.noarch 0:41.28.5.1-80.el7_9 will be an update
---> Package iwl7260-firmware.noarch 0:25.30.13.0-79.el7 will be updated
...

 

Where should I look?

Labels (1)
0 Karma

PickleRick
Ultra Champion

That's interesing because the receivers/simple should not break the event at all. As I understand the docs you're supposed to put your whole single event as a request body (that's why it's better to use normal HEC endpoint, apart from the possible additional fields). And as I checked, I had no problem, and splunk wouldn't break it on any line breaks.

Are you sure you're not sending your text in chunks?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Are you sure you're not sending your text in chunks?

I do not have a mechanism to break text in chunks.  The shell script essentially feeds yum output to REST API.

report() {
  stage=$1
  yum_status=$2
  source=$3
  <stuff...>

  curl -H "Authorization: Bearer $auth_token" \
    -d "$(date --rfc-3339=seconds|tr ' ' T) stage=$stage user=$user status=$status yum_status=$yum_status $auto" \
    -d "$(<$source)" \
    -X POST "$api_url?sourcetype=os_patch&source=$source&host=$HOSTNAME" \
    2>/dev/null >&2
} end report

systemPatch () {
  <stuff...>
  yum <...> | tee /var/tmp/${FUNCNAME[0]}.out
  report ${FUNCNAME[0]} $os_patch_status /var/tmp/${FUNCNAME[0]}.out
} # end systemPatch

The actual second -d feed can be slightly more nuanced than "$(<$source)".  For example, it can sometimes be composed of multiple segments extracted from $source.  Despite variations in -d composition, it is always one string with multiple lines.  Because the same composition is found in fragmented events and integral events, I doubt if those small variations play any role.

Meanwhile, you point to a possible direction for research.  Thanks!

0 Karma

PickleRick
Ultra Champion

Hmm. I tested by posting over 100kB of events (generated in bash with seq loops ;-)) and nothing ever broke into multiple events. Regardless of whether I let bash insert line breaks or I put explicit \r or \n - the events never broke into multiple ones. Even with consecutive line breaks.

That's why I'm asking because the behaviour is indeed strange.

BTW, if you're calling curl with multiple -d options you'll get your data sent as url-encoded, you know that?

yuanliu
SplunkTrust
SplunkTrust

Thanks for help with testing!  This can rule out length as a sole determinant.  I haven't thought of controlled experiment like that, but I should.  Maybe my network buffer is causing hiccups at random points?

0 Karma

PickleRick
Ultra Champion

Well, that's puzzling indeed since single curl call should give you a single HTTP request (unless you're following redirections but that's another story). If your request were too large, you should have been given an error by the server, curl shouldn't "split" the request in two. So that's completely puzzling. Maybe some debug at splunkd side... but that seems a bit like overkill.

0 Karma
Get Updates on the Splunk Community!

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...

Reminder! Splunk Love Promo: $25 Visa Gift Card for Your Honest SOAR Review With ...

We recently launched our first Splunk Love Special, and it's gone phenomenally well, so we're doing it again, ...