This is a long question.
We have a Heavy Forwarder and an Indexer cluster (managed through indexer cluster master.) I have a scripted input that pulls some data which is in "array of json" format. To remove the complication of array of jason, I am using SEDCMD, which works perfect. But my LINE_BREAKER does not work.
The custom add-on which has the input is hosted on the Heavy Forwarder and the props.conf is present on both HF as well as Indexers. The props.conf works perfect if I upload the data to a Single Instance Splunk Enterprise but does not work in HF--> Indexer scenario.
I have tried implementing combinations of the props.conf on both HF and Indexers, but LINE_BREAKER does not work.
Below my props.conf, I have used several combinations of the LINE_BREAKER as well as MUST_BREAK_AFTER (LINE_MERGE = TRUE)
[testdata_api]
SHOULD_LINEMERGE = 0
category = Splunk App Add-on Builder
pulldown_type = 1
#LINE_BREAKER = ((?<!"),|[\r\n]+)
#LINE_BREAKER = (\}\s+)
LINE_BREAKER = ((\}\,)|(\}\s+))
#SEDCMD-remove_prefix = s/{\n\"Report_Entry":\s\[//g
#SEDCMD-remove_trailing_commas = s/\},/}/g
#SEDCMD-remove_footer = s/\]\s\}//g
#SHOULD_LINEMERGE = true
#MUST_BREAK_AFTER = (\}\s)
Below is the sample of how my data looks like
{
"Report_Entry":
[{
"field1": "value1",
"field2": "value2",
"field3": "value3",
"field4": "value4",
"field5": "value5",
"field6": "value6",
"field7": "value7",
"field8": "value8",
"field9": "value9",
"field10": "value10"
},
{
"field1": "value1",
"field2": "value2",
"field3": "value3",
"field4": "value4",
"field5": "value5",
"field6": "value6",
"field7": "value7",
"field8": "value8",
"field9": "value9",
"field10": "value10"
},
{
"field1": "value1",
"field2": "value2",
"field3": "value3",
"field4": "value4",
"field5": "value5",
"field6": "value6",
"field7": "value7",
"field8": "value8",
"field9": "value9",
"field10": "value10"
},
{
"field1": "value1",
"field2": "value2",
"field3": "value3",
"field4": "value4",
"field5": "value5",
"field6": "value6",
"field7": "value7",
"field8": "value8",
"field9": "value9",
"field10": "value10"
}]
}
Below is the out put of btool
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf [testdata_api]
/opt/splunk/etc/system/default/props.conf ADD_EXTRA_TIME_FIELDS = True
/opt/splunk/etc/system/default/props.conf ANNOTATE_PUNCT = True
/opt/splunk/etc/system/default/props.conf AUTO_KV_JSON = true
/opt/splunk/etc/system/default/props.conf BREAK_ONLY_BEFORE =
/opt/splunk/etc/system/default/props.conf BREAK_ONLY_BEFORE_DATE = True
/opt/splunk/etc/system/default/props.conf CHARSET = UTF-8
/opt/splunk/etc/system/default/props.conf DATETIME_CONFIG = /etc/datetime.xml
/opt/splunk/etc/system/default/props.conf DEPTH_LIMIT = 1000
/opt/splunk/etc/system/default/props.conf HEADER_MODE =
/opt/splunk/etc/system/default/props.conf LEARN_MODEL = true
/opt/splunk/etc/system/default/props.conf LEARN_SOURCETYPE = true
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf LINE_BREAKER = (\}\,)
/opt/splunk/etc/system/default/props.conf LINE_BREAKER_LOOKBEHIND = 100
/opt/splunk/etc/system/default/props.conf MATCH_LIMIT = 100000
/opt/splunk/etc/system/default/props.conf MAX_DAYS_AGO = 2000
/opt/splunk/etc/system/default/props.conf MAX_DAYS_HENCE = 2
/opt/splunk/etc/system/default/props.conf MAX_DIFF_SECS_AGO = 3600
/opt/splunk/etc/system/default/props.conf MAX_DIFF_SECS_HENCE = 604800
/opt/splunk/etc/system/default/props.conf MAX_EVENTS = 256
/opt/splunk/etc/system/default/props.conf MAX_TIMESTAMP_LOOKAHEAD = 128
/opt/splunk/etc/system/default/props.conf MUST_BREAK_AFTER =
/opt/splunk/etc/system/default/props.conf MUST_NOT_BREAK_AFTER =
/opt/splunk/etc/system/default/props.conf MUST_NOT_BREAK_BEFORE =
/opt/splunk/etc/system/default/props.conf SEGMENTATION = indexing
/opt/splunk/etc/system/default/props.conf SEGMENTATION-all = full
/opt/splunk/etc/system/default/props.conf SEGMENTATION-inner = inner
/opt/splunk/etc/system/default/props.conf SEGMENTATION-outer = outer
/opt/splunk/etc/system/default/props.conf SEGMENTATION-raw = none
/opt/splunk/etc/system/default/props.conf SEGMENTATION-standard = standard
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf SHOULD_LINEMERGE = false
/opt/splunk/etc/system/default/props.conf TRANSFORMS =
/opt/splunk/etc/system/default/props.conf TRUNCATE = 10000
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf category = Splunk App Add-on Builder
/opt/splunk/etc/system/default/props.conf detect_trailing_nulls = false
/opt/splunk/etc/system/default/props.conf maxDist = 100
/opt/splunk/etc/system/default/props.conf priority =
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf pulldown_type = 1
/opt/splunk/etc/system/default/props.conf sourcetype =
I need to know, if there is any precedence of LINE_BREAKER over SEDCMD? or what causes the LINE_BREAKER to fail in my case?
Hi,
Please try below configuration on Heavy Forwarder (Not on Indexer). If you look at https://wiki.splunk.com/Community:HowIndexingWorks, LINE_BREAKER
applies in parsing queue and then it goes to aggregation & typing queue (In typing queue SEDCMD will apply to the data), so first you need to adjust LINE_BREAKER to break the events and then you need to apply SEDCMD to modify the data
props.conf (In below config LINE_BREAKER & SEDCMD regex is based on sample data you have provided, if sample data varies from actual raw data like space, new line then below regex will not work and you need to adjust it)
[yourSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=\s\}(\,\s\n\s)
SEDCMD-a=s/\{\s\n\s\"Report_Entry\"\:\s\s\n\s\[//g
SEDCMD-b=s/\s\}\]\s\n//g
Hi,
Please try below configuration on Heavy Forwarder (Not on Indexer). If you look at https://wiki.splunk.com/Community:HowIndexingWorks, LINE_BREAKER
applies in parsing queue and then it goes to aggregation & typing queue (In typing queue SEDCMD will apply to the data), so first you need to adjust LINE_BREAKER to break the events and then you need to apply SEDCMD to modify the data
props.conf (In below config LINE_BREAKER & SEDCMD regex is based on sample data you have provided, if sample data varies from actual raw data like space, new line then below regex will not work and you need to adjust it)
[yourSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=\s\}(\,\s\n\s)
SEDCMD-a=s/\{\s\n\s\"Report_Entry\"\:\s\s\n\s\[//g
SEDCMD-b=s/\s\}\]\s\n//g
Thanks harsmarvania57,
I have tried all those combinations of regex, all the regex match perfectly to the log text. Sadly, it does not break the line. I used LINE_BREAKER to break at every "," or "}" just to test the functionality, and it does not work either. This clarifies, there must be some other configuration, I missed or something else overriding.
do you have any other checks? There is no warning or error in the internal logs.
I will have more than 70,000 records in one event, will this make any impact?
When you use LINE_BREAKER
, first capturing group will be removed from your raw data so in above config which I have provided (\,\s\n\s)
command-space-newline-space will be removed from your event. Additionally when you use LINE_BREAKER
, you need to use SHOULD_LINEMERGE = false
.
There might be possibility, you might be hitting TRUNCATE
limit. But I'll suggest to test in test environment with small amount of records and once LINE_BREAKER and SEDCMD works then go with actual raw data.
P.S: Config which I have provided has been tested in my lab environment and it was working fine with sample raw data.
Reducing the number of events is not possible. I have created a file input with the lesser number of records to test. This method works in single instance splunk enterprise but fails in HF--->Indexer scenario.
SHOULD_LINEMERGE
is false and removed. And I have changed your (\,\s\n\s)
to (\,\s)
which suits my data.
I will be looking at the TRUNCATE
limits.
Dear @harsmarvania57
your suggestions have helped a lot.
I am able to resolve the issue. I have completely rebuilt the add-on with fresh configurations. There was something overriding my previous configurations (or might be some bug), but now it is working perfect.
Thanks for your timely help.
Glad that it worked. You can accept my answer/upvote which helped you.