Getting Data In

AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded

WonjinKim
Engager

Splunk has warning log:

WARN AggregatorMiningProcessor [10530 merging] - Breaking event because limit of 256 has been exceeded ... data_sourcetype="my_json"

The "my_json" for UF is:

[my_json]
DATETIME_CONFIG =
KV_MODE = json
LINE_BREAKER = (?:,)([\r\n]+)
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = _time
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0
category = Structured
description = my json type without truncate
disabled = false
pulldown_type = 1
MAX_EVENTS = 2500
BREAK_ONLY_BEFORE_DATE = true

 

The data has about 5000 Lines, sample is the below:

{
"Versions" :
{
"sample_version" : "version.json",
"name" : "my_json",
"revision" : "rev2.0"},
"Domains" :
[{
"reset_domain_name" : "RESET_DOMAIN",
"domain_number" : 2,
"data_fields" :
["Namespaces/data1", "Namespaces/data2"]
}
],
"log" :
["1 ERROR No such directory and file",
"2 ERROR No such directory and file",
"3 ERROR No such directory and file",
"4 ERROR No such directory and file"
],
"address" :
[{
"index": 1,
"addr": "0xFFFFFF"}
],
"fail_reason" :
[{
"reason" : "SystemError",
"count" : 5},
{
"reason" : "RuntimeError",
"count" : 0},
{
"reason" : "ValueError",
"count" : 1}
],
...
blahblah
...
"comment" : "None"}

How to fix this warning log? We add "MAX_EVENTS" field in props.conf, but it does not working.

0 Karma

sainag_splunk
Splunk Employee
Splunk Employee

The issue you're experiencing is related to event breaking, not the MAX_EVENTS setting. The warning suggests that Splunk is trying to merge multiple events into a single event, which is exceeding the default limit of 256 lines.

Your props should be on the indexers (your parsing instance or HWF), as there are only a very few settings which work on the universal forwarder such as EVENT_BREAKER_ENABLE, EVENT_BREAKER, and indexed extractions.

The best way to address this here is to use:

 

 

LINE_BREAKER =([\r\n]+)(?:,)
SHOULD_LINE_MERGE = False

 

These settings in your props.conf on the indexer will help ensure that each JSON object is treated as a separate event, preventing the merging that's causing the warning.

Additionally, for JSON data like this, you might want to consider using this on the search head only:

 

 

KV_MODE = json

 

 

This setting helps Splunk interpret the JON structure during search time, making it easier to extract and query specific fields from your JSON data.

Please UpVote if this is Helpful.

If this helps, Upvote!!!!
Together we make the Splunk Community stronger 

WonjinKim
Engager

Dear @sainag_splunk 

I tried using the below props.conf:

DATETIME_CONFIG =
KV_MODE = json
LINE_BREAKER = (?:,)([\r\n]+))
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = _time
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0
category = Structured
description = my json type without truncate
disabled = false
pulldown_type = 1
MAX_EVENTS=1000000
SHOULD_LINE_MERGE = false

 But is was same.

Umm... I wonder something for your answer, 

I applied it deployer server, It will deploy to apps for all of Universal Forwarder. 
So if I set the inputs.conf as a below:

[batch://C:\splunk\my_data\*.json]
index=myIndex
sourcetype=my_json
crcSalt=<SOURCE>
move_policy = sinkhole

 

The app which address this inputs.conf has above props.conf.

However, your answer's concept is not this applied, isn't it?

How to apply your answer in my system..? I hope you help me in detail, I'm sorry for I'm begineer in splunk.

 

My system has 3 search heads, 1 is splunk app, 2 is cluster master and 3 is deployer.

In this, 5 indexers.. So the client which is installed UF will send the data to 5 indexers with L/B, and We search in 3 search heads, the results are shown.

 

Please help me, Thank you.

0 Karma

sainag_splunk
Splunk Employee
Splunk Employee

Go to your cluster master/manager and deploy the app with props.conf from the master-apps. 
For example:

[my_json]
SHOULD_LINE_MERGE = false
LINE_BREAKER = (?:,)([\r\n]+))
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0

You can edit props.conf in $SPLUNK_HOME/etc/master-apps/_cluster/local/props.conf on master and push cluster-bundle with command 'splunk apply cluster-bundle'. Peers will restart and props.conf, in $SPLUNK_HOME/etc/slave-apps/_cluster/local/props.conf, will be layered when splunkd start.

https://conf.splunk.com/files/2017/slides/pushing-configuration-bundles-in-an-indexer-cluster.pdf

 

Go to your search head and place the props.conf and restart your search head for the field extractions 

[my_json]
KV_MODE = json

 

Remember to be careful if you are updating all these on the production, based on the changes it will require the restart of indexers. please be cautious on the changes.

 

If you need more hands-on support, we have splunk ondemand services who can guide you through this process and shoulder surf your requirements help you.

If this helps, Upvote!!!!
Together we make the Splunk Community stronger 
0 Karma

WonjinKim
Engager

Hi, @sainag_splunk 

My problem is still remained. Sorry for that your solution didn't solve my problem... 

I tried some cases more, will ask about I tried cases.

By the way, I have another question for this issue.

I tried to change the props.conf for json parsing,

"KV_MODE=json" -> "KV_MODE=none"

Add "INDEXED_EXTRACTIONS=json"

But I think there are errors in parsing to json. 

Why this errors was occurred??

My search query is

index=_internal JsonLineBreaker NOT StreamedSearch

And results show many below lines.

10-10-2024 13:05:55.318 +0900 ERROR JsonLineBreaker [2427 structuredparsing] - JSON StreamId:8181676460594335103 had parsing error:Unexpected character while looking for value: '}' - data_source="*****.json", data_host="****", data_sourcetype="my_json"
  • host = ****
  • source = /opt/splunkforwarder/var/log/splunk/splunkd.log
  • sourcetype = splunkd
10-10-2024 13:05:55.315 +0900 ERROR JsonLineBreaker [2427 structuredparsing] - JSON StreamId:8181676460594335103 had parsing error:Unexpected character while looking for value: '}' - data_source="*****.json", data_host="****", data_sourcetype="my_json"
  • host = ****
  • source = /opt/splunkforwarder/var/log/splunk/splunkd.log
  • sourcetype = splunkd

...

I checked the json file, but there is no invalid characters in json.
Also, I tried to parse json in Python or JsonParseWebEditor,,, there is no problems.

Why this logs are remained??

0 Karma

sainag_splunk
Splunk Employee
Splunk Employee

The reason for your error is "Poorly formatted data" .

Regarding INDEXED_EXTRACTIONS=JSON, here is the good article on when/where it can be used.


Can you please run this search and show me the output for your sourcetype?

index=_internal source=*splunkd.log* AggregatorMiningProcessor OR LineBreakingProcessor OR DateParserVerbose WARN data_sourcetype="my_json" | rex "(?<type>(Failed to parse timestamp|suspiciously far away|outside of the acceptable time window|too far away from the previous|Accepted time format has changed|Breaking event because limit of \d+|Truncating line because limit of \d+))" | eval type=if(isnull(type),"unknown",type)

| rex "source::(?<eventsource>[^\|]*)\|host::(?<eventhost>[^\|]*)\|(?<eventsourcetype>[^\|]*)\|(?<eventport>[^\s]*)"
| eval eventsourcetype=if(isnull(eventsourcetype),data_sourcetype,eventsourcetype)
| stats count dc(eventhost) values(eventsource) dc(eventsource) values(type) values(index) by component eventsourcetype
| sort -count



 



If this helps, Upvote!!!!
Together we make the Splunk Community stronger 
0 Karma

WonjinKim
Engager

Hi, @sainag_splunk 

I entered your search command on my splunk search app, the results were not shown. No results in your command from my source type, "my_json".

I have confused how to resolve this issue, It may cause critical errors for analysing our data. 

Is there anything to try to resolve the issue?

I have tried that, 

the data has line breaking after ':', so the parsing error was caused, in my think.

I treid to change the value "LINE_BREAKER=[}|,]+[\r\n]+", this means if the end of line is ":\r\n", UF will don't break the line. But though changing the LINE_BREAKER value, the parsing errors are still raised. 

24/10/23 12:02:22.193
 
10-23-2024 12:02:22.193 +0900 ERROR JsonLineBreaker [7804 structuredparsing] - JSON StreamId:15916142412051242565 had parsing error:Unexpected character: ':' - data_source="C:\splunk\<my_path>.bin", data_host="<my_host>", data_sourcetype="my_json"
0 Karma

sainag_splunk
Splunk Employee
Splunk Employee

I think you had the below somewhere in there. You need to get rid of that.

 

INDEXED_EXTRACTIONS = json

 

 

You might be able to find the offending event with something like 

 

index=_internal sourcetype=my_json NOT datetime=*​

 

 

 

 

If this helps, Upvote!!!!
Together we make the Splunk Community stronger 
0 Karma

WonjinKim
Engager

Hello, @sainag_splunk 

I tried more things, but issue has still raised.

So I checked our messages, I will try to apply something to props.conf

I told my infra has 3 search heads, and 5 indexers. If I set the props.conf, the field

kv_mode =json

is applied to props.conf in both search head, UF?

I mean this will be applied to UF, "<SPLUNK_HOME>/etc/deployment-app/<target>/local/props.conf"

and 3 search head, all of "<SPLUNK_HOME>/system/local/props.conf"?

How to apply something field to UF, search head?

 

---

I have tried to remove "EXTRACED_INDEXED=json", and add "kv_mode=json", but the results were shown as below:


24/10/28 16:33:57.000
{
"Versions" :
{
"google_version" : "telemetry-json1.json",
"ssd_vendor_name" : "Vendors",
show more 257 rows

24/10/28 16:33:57.000
"0xbf4 INFO System shutdown: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbf4 INFO System state active: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbf1 INFO System shutdown: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbf1 INFO System state active: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbee INFO System shutdown: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbee INFO System state active: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbec INFO System shutdown: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"dram_corrected_count" : 0,
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
],
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
{
}
host = <host> source = <source> sourcetype = my_json

 

I don't know why indexers receieved the data with line by line after first json parsing. LINE BREAKER is same with above. 

Thank you.

0 Karma
Get Updates on the Splunk Community!

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...

September Community Champions: A Shoutout to Our Contributors!

As we close the books on another fantastic month, we want to take a moment to celebrate the people who are the ...

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...