Splunk has warning log:
WARN AggregatorMiningProcessor [10530 merging] - Breaking event because limit of 256 has been exceeded ... data_sourcetype="my_json"
The "my_json" for UF is:
[my_json]
DATETIME_CONFIG =
KV_MODE = json
LINE_BREAKER = (?:,)([\r\n]+)
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = _time
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0
category = Structured
description = my json type without truncate
disabled = false
pulldown_type = 1
MAX_EVENTS = 2500
BREAK_ONLY_BEFORE_DATE = true
The data has about 5000 Lines, sample is the below:
{
"Versions" :
{
"sample_version" : "version.json",
"name" : "my_json",
"revision" : "rev2.0"},
"Domains" :
[{
"reset_domain_name" : "RESET_DOMAIN",
"domain_number" : 2,
"data_fields" :
["Namespaces/data1", "Namespaces/data2"]
}
],
"log" :
["1 ERROR No such directory and file",
"2 ERROR No such directory and file",
"3 ERROR No such directory and file",
"4 ERROR No such directory and file"
],
"address" :
[{
"index": 1,
"addr": "0xFFFFFF"}
],
"fail_reason" :
[{
"reason" : "SystemError",
"count" : 5},
{
"reason" : "RuntimeError",
"count" : 0},
{
"reason" : "ValueError",
"count" : 1}
],
...
blahblah
...
"comment" : "None"}
How to fix this warning log? We add "MAX_EVENTS" field in props.conf, but it does not working.
The issue you're experiencing is related to event breaking, not the MAX_EVENTS setting. The warning suggests that Splunk is trying to merge multiple events into a single event, which is exceeding the default limit of 256 lines.
Your props should be on the indexers (your parsing instance or HWF), as there are only a very few settings which work on the universal forwarder such as EVENT_BREAKER_ENABLE, EVENT_BREAKER, and indexed extractions.
The best way to address this here is to use:
LINE_BREAKER =([\r\n]+)(?:,)
SHOULD_LINE_MERGE = False
These settings in your props.conf on the indexer will help ensure that each JSON object is treated as a separate event, preventing the merging that's causing the warning.
Additionally, for JSON data like this, you might want to consider using this on the search head only:
KV_MODE = json
This setting helps Splunk interpret the JON structure during search time, making it easier to extract and query specific fields from your JSON data.
Please UpVote if this is Helpful.
Dear @sainag_splunk
I tried using the below props.conf:
DATETIME_CONFIG =
KV_MODE = json
LINE_BREAKER = (?:,)([\r\n]+))
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = _time
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0
category = Structured
description = my json type without truncate
disabled = false
pulldown_type = 1
MAX_EVENTS=1000000
SHOULD_LINE_MERGE = false
But is was same.
Umm... I wonder something for your answer,
I applied it deployer server, It will deploy to apps for all of Universal Forwarder.
So if I set the inputs.conf as a below:
[batch://C:\splunk\my_data\*.json]
index=myIndex
sourcetype=my_json
crcSalt=<SOURCE>
move_policy = sinkhole
The app which address this inputs.conf has above props.conf.
However, your answer's concept is not this applied, isn't it?
How to apply your answer in my system..? I hope you help me in detail, I'm sorry for I'm begineer in splunk.
My system has 3 search heads, 1 is splunk app, 2 is cluster master and 3 is deployer.
In this, 5 indexers.. So the client which is installed UF will send the data to 5 indexers with L/B, and We search in 3 search heads, the results are shown.
Please help me, Thank you.
Go to your cluster master/manager and deploy the app with props.conf from the master-apps.
For example:
[my_json]
SHOULD_LINE_MERGE = false
LINE_BREAKER = (?:,)([\r\n]+))
TIME_FORMAT = %2Y%m%d%H%M%S
TRUNCATE = 0
You can edit props.conf in $SPLUNK_HOME/etc/master-apps/_cluster/local/props.conf on master and push cluster-bundle with command 'splunk apply cluster-bundle'. Peers will restart and props.conf, in $SPLUNK_HOME/etc/slave-apps/_cluster/local/props.conf, will be layered when splunkd start.
https://conf.splunk.com/files/2017/slides/pushing-configuration-bundles-in-an-indexer-cluster.pdf
Go to your search head and place the props.conf and restart your search head for the field extractions
[my_json]
KV_MODE = json
Remember to be careful if you are updating all these on the production, based on the changes it will require the restart of indexers. please be cautious on the changes.
If you need more hands-on support, we have splunk ondemand services who can guide you through this process and shoulder surf your requirements help you.
Hi, @sainag_splunk
My problem is still remained. Sorry for that your solution didn't solve my problem...
I tried some cases more, will ask about I tried cases.
By the way, I have another question for this issue.
I tried to change the props.conf for json parsing,
"KV_MODE=json" -> "KV_MODE=none"
Add "INDEXED_EXTRACTIONS=json"
But I think there are errors in parsing to json.
Why this errors was occurred??
My search query is
index=_internal JsonLineBreaker NOT StreamedSearch
And results show many below lines.
10-10-2024 13:05:55.318 +0900 ERROR JsonLineBreaker [2427 structuredparsing] - JSON StreamId:8181676460594335103 had parsing error:Unexpected character while looking for value: '}' - data_source="*****.json", data_host="****", data_sourcetype="my_json"
10-10-2024 13:05:55.315 +0900 ERROR JsonLineBreaker [2427 structuredparsing] - JSON StreamId:8181676460594335103 had parsing error:Unexpected character while looking for value: '}' - data_source="*****.json", data_host="****", data_sourcetype="my_json"
... |
I checked the json file, but there is no invalid characters in json.
Also, I tried to parse json in Python or JsonParseWebEditor,,, there is no problems.
Why this logs are remained??
The reason for your error is "Poorly formatted data" .
Regarding INDEXED_EXTRACTIONS=JSON, here is the good article on when/where it can be used.
Can you please run this search and show me the output for your sourcetype?
index=_internal source=*splunkd.log* AggregatorMiningProcessor OR LineBreakingProcessor OR DateParserVerbose WARN data_sourcetype="my_json" | rex "(?<type>(Failed to parse timestamp|suspiciously far away|outside of the acceptable time window|too far away from the previous|Accepted time format has changed|Breaking event because limit of \d+|Truncating line because limit of \d+))" | eval type=if(isnull(type),"unknown",type)
| rex "source::(?<eventsource>[^\|]*)\|host::(?<eventhost>[^\|]*)\|(?<eventsourcetype>[^\|]*)\|(?<eventport>[^\s]*)"
| eval eventsourcetype=if(isnull(eventsourcetype),data_sourcetype,eventsourcetype)
| stats count dc(eventhost) values(eventsource) dc(eventsource) values(type) values(index) by component eventsourcetype
| sort -count
Hi, @sainag_splunk
I entered your search command on my splunk search app, the results were not shown. No results in your command from my source type, "my_json".
I have confused how to resolve this issue, It may cause critical errors for analysing our data.
Is there anything to try to resolve the issue?
I have tried that,
the data has line breaking after ':', so the parsing error was caused, in my think.
I treid to change the value "LINE_BREAKER=[}|,]+[\r\n]+", this means if the end of line is ":\r\n", UF will don't break the line. But though changing the LINE_BREAKER value, the parsing errors are still raised.
24/10/23 12:02:22.193 | 10-23-2024 12:02:22.193 +0900 ERROR JsonLineBreaker [7804 structuredparsing] - JSON StreamId:15916142412051242565 had parsing error:Unexpected character: ':' - data_source="C:\splunk\<my_path>.bin", data_host="<my_host>", data_sourcetype="my_json" |
I think you had the below somewhere in there. You need to get rid of that.
INDEXED_EXTRACTIONS = json
index=_internal sourcetype=my_json NOT datetime=*
Hello, @sainag_splunk
I tried more things, but issue has still raised.
So I checked our messages, I will try to apply something to props.conf
I told my infra has 3 search heads, and 5 indexers. If I set the props.conf, the field
kv_mode =json
is applied to props.conf in both search head, UF?
I mean this will be applied to UF, "<SPLUNK_HOME>/etc/deployment-app/<target>/local/props.conf"
and 3 search head, all of "<SPLUNK_HOME>/system/local/props.conf"?
How to apply something field to UF, search head?
---
I have tried to remove "EXTRACED_INDEXED=json", and add "kv_mode=json", but the results were shown as below:
24/10/28 16:33:57.000 24/10/28 16:33:57.000 24/10/28 16:33:57.000 24/10/28 16:33:57.000 24/10/28 16:33:57.000 24/10/28 16:33:57.000 24/10/28 16:33:57.000 24/10/28 16:33:57.000 24/10/28 16:33:57.000 24/10/28 16:33:57.000 |
I don't know why indexers receieved the data with line by line after first json parsing. LINE BREAKER is same with above.
Thank you.