AggregatorMiningProcessor - Breaking event because...

WonjinKim · ‎09-26-2024

Splunk has warning log:

WARN AggregatorMiningProcessor [10530 merging] - Breaking event because limit of 256 has been exceeded ... data_sourcetype="my_json"

The "my_json" for UF is:

[my_json]
DATETIME_CONFIG =
KV_MODE = json
LINE_BREAKER = (?:,)([\r\n]+)
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = _time
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0
category = Structured
description = my json type without truncate
disabled = false
pulldown_type = 1
MAX_EVENTS = 2500
BREAK_ONLY_BEFORE_DATE = true

The data has about 5000 Lines, sample is the below:

{
"Versions" :
{
"sample_version" : "version.json",
"name" : "my_json",
"revision" : "rev2.0"},
"Domains" :
[{
"reset_domain_name" : "RESET_DOMAIN",
"domain_number" : 2,
"data_fields" :
["Namespaces/data1", "Namespaces/data2"]
}
],
"log" :
["1 ERROR No such directory and file", 
"2 ERROR No such directory and file", 
"3 ERROR No such directory and file", 
"4 ERROR No such directory and file"
],
"address" :
[{
"index": 1,
"addr": "0xFFFFFF"}
],
"fail_reason" :
[{
"reason" : "SystemError",
"count" : 5},
{
"reason" : "RuntimeError",
"count" : 0},
{
"reason" : "ValueError",
"count" : 1}
],
...
blahblah
...
"comment" : "None"}

How to fix this warning log? We add "MAX_EVENTS" field in props.conf, but it does not working.

sainag_splunk · ‎09-30-2024

The issue you're experiencing is related to event breaking, not the MAX_EVENTS setting. The warning suggests that Splunk is trying to merge multiple events into a single event, which is exceeding the default limit of 256 lines.

Your props should be on the indexers (your parsing instance or HWF), as there are only a very few settings which work on the universal forwarder such as EVENT_BREAKER_ENABLE, EVENT_BREAKER, and indexed extractions.

The best way to address this here is to use:

LINE_BREAKER =([\r\n]+)(?:,)
SHOULD_LINE_MERGE = False

These settings in your props.conf on the indexer will help ensure that each JSON object is treated as a separate event, preventing the merging that's causing the warning.

Additionally, for JSON data like this, you might want to consider using this on the search head only:

KV_MODE = json

This setting helps Splunk interpret the JON structure during search time, making it easier to extract and query specific fields from your JSON data.

Please UpVote if this is Helpful.

If this helps, Upvote!!!!
Together we make the Splunk Community stronger 

WonjinKim · ‎10-01-2024

Dear @sainag_splunk

I tried using the below props.conf:

DATETIME_CONFIG =
KV_MODE = json
LINE_BREAKER = (?:,)([\r\n]+))
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = _time
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0
category = Structured
description = my json type without truncate
disabled = false
pulldown_type = 1
MAX_EVENTS=1000000
SHOULD_LINE_MERGE = false

But is was same.

Umm... I wonder something for your answer,

I applied it deployer server, It will deploy to apps for all of Universal Forwarder.
So if I set the inputs.conf as a below:

[batch://C:\splunk\my_data\*.json]
index=myIndex
sourcetype=my_json
crcSalt=<SOURCE>
move_policy = sinkhole

The app which address this inputs.conf has above props.conf.

However, your answer's concept is not this applied, isn't it?

How to apply your answer in my system..? I hope you help me in detail, I'm sorry for I'm begineer in splunk.

My system has 3 search heads, 1 is splunk app, 2 is cluster master and 3 is deployer.

In this, 5 indexers.. So the client which is installed UF will send the data to 5 indexers with L/B, and We search in 3 search heads, the results are shown.

Please help me, Thank you.

sainag_splunk · ‎10-02-2024

Go to your cluster master/manager and deploy the app with props.conf from the master-apps.
For example:

[my_json]
SHOULD_LINE_MERGE = false
LINE_BREAKER = (?:,)([\r\n]+))
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0

You can edit props.conf in $SPLUNK_HOME/etc/master-apps/_cluster/local/props.conf on master and push cluster-bundle with command 'splunk apply cluster-bundle'. Peers will restart and props.conf, in $SPLUNK_HOME/etc/slave-apps/_cluster/local/props.conf, will be layered when splunkd start.

https://conf.splunk.com/files/2017/slides/pushing-configuration-bundles-in-an-indexer-cluster.pdf

Go to your search head and place the props.conf and restart your search head for the field extractions

[my_json]
KV_MODE = json

Remember to be careful if you are updating all these on the production, based on the changes it will require the restart of indexers. please be cautious on the changes.

If you need more hands-on support, we have splunk ondemand services who can guide you through this process and shoulder surf your requirements help you.

If this helps, Upvote!!!!
Together we make the Splunk Community stronger 

WonjinKim · ‎10-09-2024

Hi, @sainag_splunk

My problem is still remained. Sorry for that your solution didn't solve my problem...

I tried some cases more, will ask about I tried cases.

By the way, I have another question for this issue.

I tried to change the props.conf for json parsing,

"KV_MODE=json" -> "KV_MODE=none"

Add "INDEXED_EXTRACTIONS=json"

But I think there are errors in parsing to json.

Why this errors was occurred??

My search query is

index=_internal JsonLineBreaker NOT StreamedSearch

And results show many below lines.

10-10-2024 13:05:55.318 +0900 ERROR JsonLineBreaker [2427 structuredparsing] - JSON StreamId:8181676460594335103 had parsing error:Unexpected character while looking for value: '}' - data_source="*****.json", data_host="****", data_sourcetype="my_json"

host = ****
source = /opt/splunkforwarder/var/log/splunk/splunkd.log
sourcetype = splunkd

10-10-2024 13:05:55.315 +0900 ERROR JsonLineBreaker [2427 structuredparsing] - JSON StreamId:8181676460594335103 had parsing error:Unexpected character while looking for value: '}' - data_source="*****.json", data_host="****", data_sourcetype="my_json"

host = ****
source = /opt/splunkforwarder/var/log/splunk/splunkd.log
sourcetype = splunkd

...

I checked the json file, but there is no invalid characters in json.
Also, I tried to parse json in Python or JsonParseWebEditor,,, there is no problems.

Why this logs are remained??

sainag_splunk · ‎10-10-2024

The reason for your error is "Poorly formatted data" .

Regarding INDEXED_EXTRACTIONS=JSON, here is the good article on when/where it can be used.

Can you please run this search and show me the output for your sourcetype?

index=_internal source=*splunkd.log* AggregatorMiningProcessor OR LineBreakingProcessor OR DateParserVerbose WARN data_sourcetype="my_json" | rex "(?<type>(Failed to parse timestamp|suspiciously far away|outside of the acceptable time window|too far away from the previous|Accepted time format has changed|Breaking event because limit of \d+|Truncating line because limit of \d+))" | eval type=if(isnull(type),"unknown",type)

| rex "source::(?<eventsource>[^\|]*)\|host::(?<eventhost>[^\|]*)\|(?<eventsourcetype>[^\|]*)\|(?<eventport>[^\s]*)"
| eval eventsourcetype=if(isnull(eventsourcetype),data_sourcetype,eventsourcetype)
| stats count dc(eventhost) values(eventsource) dc(eventsource) values(type) values(index) by component eventsourcetype
| sort -count

If this helps, Upvote!!!!
Together we make the Splunk Community stronger 

WonjinKim · ‎10-22-2024

Hi, @sainag_splunk

I entered your search command on my splunk search app, the results were not shown. No results in your command from my source type, "my_json".

I have confused how to resolve this issue, It may cause critical errors for analysing our data.

Is there anything to try to resolve the issue?

I have tried that,

the data has line breaking after ':', so the parsing error was caused, in my think.

I treid to change the value "LINE_BREAKER=[}|,]+[\r\n]+", this means if the end of line is ":\r\n", UF will don't break the line. But though changing the LINE_BREAKER value, the parsing errors are still raised.

24/10/23 12:02:22.193

10-23-2024 12:02:22.193 +0900 ERROR JsonLineBreaker [7804 structuredparsing] - JSON StreamId:15916142412051242565 had parsing error:Unexpected character: ':' - data_source="C:\splunk\<my_path>.bin", data_host="<my_host>", data_sourcetype="my_json"

sainag_splunk · ‎10-24-2024

I think you had the below somewhere in there. You need to get rid of that.

INDEXED_EXTRACTIONS = json

You might be able to find the offending event with something like

index=_internal sourcetype=my_json NOT datetime=*

If this helps, Upvote!!!!
Together we make the Splunk Community stronger 

WonjinKim · ‎10-28-2024

Hello, @sainag_splunk

I tried more things, but issue has still raised.

So I checked our messages, I will try to apply something to props.conf

I told my infra has 3 search heads, and 5 indexers. If I set the props.conf, the field

kv_mode =json

is applied to props.conf in both search head, UF?

I mean this will be applied to UF, "<SPLUNK_HOME>/etc/deployment-app/<target>/local/props.conf"

and 3 search head, all of "<SPLUNK_HOME>/system/local/props.conf"?

How to apply something field to UF, search head?

---

I have tried to remove "EXTRACED_INDEXED=json", and add "kv_mode=json", but the results were shown as below:

24/10/28 16:33:57.000
{
"Versions" :
{
"google_version" : "telemetry-json1.json",
"ssd_vendor_name" : "Vendors",
show more 257 rows

24/10/28 16:33:57.000
"0xbf4 INFO System shutdown: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbf4 INFO System state active: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbf1 INFO System shutdown: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbf1 INFO System state active: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbee INFO System shutdown: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbee INFO System state active: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"0xbec INFO System shutdown: 0h 0h 0h",
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
"dram_corrected_count" : 0,
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
],
host = <host> source = <source> sourcetype = my_json

24/10/28 16:33:57.000
{
}
host = <host> source = <source> sourcetype = my_json

I don't know why indexers receieved the data with line by line after first json parsing. LINE BREAKER is same with above.

Thank you.

AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded

heavy forwarder

indexer

props.conf

universal forwarder

Fastest way to demo Observability

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

Are you a member of the Splunk Community?

AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded

heavy forwarder

indexer

props.conf

universal forwarder

Fastest way to demo Observability

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps