Getting Data In

Why are REST API receivers/simple breaks input unexpectedly?

yuanliu
SplunkTrust
SplunkTrust

I have a script that sends effectively yum outputs to receivers/simple.  props.conf says

[yumstuff]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Miscellaneous
pulldown_type = 1

I expect the each post to be one event.  But some posts get broken into multiple events for unknown reasons.  My guess is that those posts are longer, although I couldn't find any applicable limit in limits.conf.  The broken ones are not all that long to start.  I examined one that was broken into three "events ".  Combined, they have 18543 chars, 271 lines.  The closest attribute in limits.conf I can find is maxchars, but that's for [kv] only, and the limit is already high:

[kv]
indexed_kv_limit = 1000
maxchars = 40960

The way it is broken also confuses me.  My post begins with a timestamp, followed by some bookkeeping kv pairs, then yum output.  If this breakage is caused by limits, I would expect the event containing the first part to be the biggest, to the extent it exceeds that limit.  But in general, the "event" corresponding to the end of the post is the biggest; even stranger, the middle "event" generally is extremely small containing only one line.  In the post I examined, for example, the first "event" contained 6710 chars, the second, 71 chars, the last, 11762 chars.  The breaking points are not special, either.  For example,

2022-02-09T19:51:28+00:00 ...

...

---> Package iwl6000g2b-firmware.noarch 0:18.168.6.1-79.el7 will be updated

---> Package iwl6000g2b-firmware.noarch 0:18.168.6.1-80.el7_9 will be an update

<break>

---> Package iwl6050-firmware.noarch 0:41.28.5.1-79.el7 will be updated

<break>
---> Package iwl6050-firmware.noarch 0:41.28.5.1-80.el7_9 will be an update
---> Package iwl7260-firmware.noarch 0:25.30.13.0-79.el7 will be updated
...

 

Where should I look?

Labels (1)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Line breaking describes MAX_EVENTS thus:

MAX_EVENTS = <integer>
* The maximum number of input lines to add to any event.
* Splunk software breaks after it reads the specified number of lines.
* Default: 256

I looked at my broken events, maximum number of lines seems to be 257.  Knowing some of my outputs are > 1000, I added MAX_EVENTS = 2000 to the sourcetype.  Now I am seeing new events with large number of lines, no more broken events. (It took some time for this change to take effect, though.)

Just to be clear: This is unrelated to REST API receivers/simple endpoint, merely a matter of lines in individual events.  The limit is set in props.conf per source type; that is why I could not find any applicable setting in limit.conf.

View solution in original post

Tags (2)

yuanliu
SplunkTrust
SplunkTrust

Line breaking describes MAX_EVENTS thus:

MAX_EVENTS = <integer>
* The maximum number of input lines to add to any event.
* Splunk software breaks after it reads the specified number of lines.
* Default: 256

I looked at my broken events, maximum number of lines seems to be 257.  Knowing some of my outputs are > 1000, I added MAX_EVENTS = 2000 to the sourcetype.  Now I am seeing new events with large number of lines, no more broken events. (It took some time for this change to take effect, though.)

Just to be clear: This is unrelated to REST API receivers/simple endpoint, merely a matter of lines in individual events.  The limit is set in props.conf per source type; that is why I could not find any applicable setting in limit.conf.

Tags (2)

PickleRick
SplunkTrust
SplunkTrust

That's interesing because the receivers/simple should not break the event at all. As I understand the docs you're supposed to put your whole single event as a request body (that's why it's better to use normal HEC endpoint, apart from the possible additional fields). And as I checked, I had no problem, and splunk wouldn't break it on any line breaks.

Are you sure you're not sending your text in chunks?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Are you sure you're not sending your text in chunks?

I do not have a mechanism to break text in chunks.  The shell script essentially feeds yum output to REST API.

report() {
  stage=$1
  yum_status=$2
  source=$3
  <stuff...>

  curl -H "Authorization: Bearer $auth_token" \
    -d "$(date --rfc-3339=seconds|tr ' ' T) stage=$stage user=$user status=$status yum_status=$yum_status $auto" \
    -d "$(<$source)" \
    -X POST "$api_url?sourcetype=os_patch&source=$source&host=$HOSTNAME" \
    2>/dev/null >&2
} end report

systemPatch () {
  <stuff...>
  yum <...> | tee /var/tmp/${FUNCNAME[0]}.out
  report ${FUNCNAME[0]} $os_patch_status /var/tmp/${FUNCNAME[0]}.out
} # end systemPatch

The actual second -d feed can be slightly more nuanced than "$(<$source)".  For example, it can sometimes be composed of multiple segments extracted from $source.  Despite variations in -d composition, it is always one string with multiple lines.  Because the same composition is found in fragmented events and integral events, I doubt if those small variations play any role.

Meanwhile, you point to a possible direction for research.  Thanks!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Hmm. I tested by posting over 100kB of events (generated in bash with seq loops ;-)) and nothing ever broke into multiple events. Regardless of whether I let bash insert line breaks or I put explicit \r or \n - the events never broke into multiple ones. Even with consecutive line breaks.

That's why I'm asking because the behaviour is indeed strange.

BTW, if you're calling curl with multiple -d options you'll get your data sent as url-encoded, you know that?

yuanliu
SplunkTrust
SplunkTrust

Thanks for help with testing!  This can rule out length as a sole determinant.  I haven't thought of controlled experiment like that, but I should.  Maybe my network buffer is causing hiccups at random points?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, that's puzzling indeed since single curl call should give you a single HTTP request (unless you're following redirections but that's another story). If your request were too large, you should have been given an error by the server, curl shouldn't "split" the request in two. So that's completely puzzling. Maybe some debug at splunkd side... but that seems a bit like overkill.

0 Karma
Get Updates on the Splunk Community!

How to Get Started with Splunk Data Management Pipeline Builders (Edge Processor & ...

If you want to gain full control over your growing data volumes, check out Splunk’s Data Management pipeline ...

Out of the Box to Up And Running - Streamlined Observability for Your Cloud ...

  Tech Talk Streamlined Observability for Your Cloud Environment Register    Out of the Box to Up And Running ...

Splunk Smartness with Brandon Sternfield | Episode 3

Hello and welcome to another episode of "Splunk Smartness," the interview series where we explore the power of ...