Getting Data In

Question about LINE_BREAKER and SEDCMD

ashutosh2020
Explorer

This is a long question.

We have a Heavy Forwarder and an Indexer cluster (managed through indexer cluster master.) I have a scripted input that pulls some data which is in "array of json" format. To remove the complication of array of jason, I am using SEDCMD, which works perfect. But my LINE_BREAKER does not work.

The custom add-on which has the input is hosted on the Heavy Forwarder and the props.conf is present on both HF as well as Indexers. The props.conf works perfect if I upload the data to a Single Instance Splunk Enterprise but does not work in HF--> Indexer scenario.

I have tried implementing combinations of the props.conf on both HF and Indexers, but LINE_BREAKER does not work.

Below my props.conf, I have used several combinations of the LINE_BREAKER as well as MUST_BREAK_AFTER (LINE_MERGE = TRUE)

[testdata_api]
SHOULD_LINEMERGE = 0
category = Splunk App Add-on Builder
pulldown_type = 1
#LINE_BREAKER = ((?<!"),|[\r\n]+)
#LINE_BREAKER = (\}\s+)
LINE_BREAKER = ((\}\,)|(\}\s+))
#SEDCMD-remove_prefix = s/{\n\"Report_Entry":\s\[//g
#SEDCMD-remove_trailing_commas = s/\},/}/g
#SEDCMD-remove_footer = s/\]\s\}//g
#SHOULD_LINEMERGE = true
#MUST_BREAK_AFTER = (\}\s)

Below is the sample of how my data looks like

{
"Report_Entry": 
[{
   "field1": "value1",
   "field2": "value2",
   "field3": "value3",
   "field4": "value4",
   "field5": "value5",
   "field6": "value6",
   "field7": "value7",
   "field8": "value8",
   "field9": "value9",
   "field10": "value10"
},
{
   "field1": "value1",
   "field2": "value2",
   "field3": "value3",
   "field4": "value4",
   "field5": "value5",
   "field6": "value6",
   "field7": "value7",
   "field8": "value8",
   "field9": "value9",
   "field10": "value10"
},
{
  "field1": "value1",
   "field2": "value2",
   "field3": "value3",
   "field4": "value4",
   "field5": "value5",
   "field6": "value6",
   "field7": "value7",
   "field8": "value8",
   "field9": "value9",
   "field10": "value10"
},
{
   "field1": "value1",
   "field2": "value2",
   "field3": "value3",
   "field4": "value4",
   "field5": "value5",
   "field6": "value6",
   "field7": "value7",
   "field8": "value8",
   "field9": "value9",
   "field10": "value10"
}]
}

Below is the out put of btool

/opt/splunk/etc/apps/TA-testdata_api/local/props.conf [testdata_api]
/opt/splunk/etc/system/default/props.conf                 ADD_EXTRA_TIME_FIELDS = True
/opt/splunk/etc/system/default/props.conf                 ANNOTATE_PUNCT = True
/opt/splunk/etc/system/default/props.conf                 AUTO_KV_JSON = true
/opt/splunk/etc/system/default/props.conf                 BREAK_ONLY_BEFORE =
/opt/splunk/etc/system/default/props.conf                 BREAK_ONLY_BEFORE_DATE = True
/opt/splunk/etc/system/default/props.conf                 CHARSET = UTF-8
/opt/splunk/etc/system/default/props.conf                 DATETIME_CONFIG = /etc/datetime.xml
/opt/splunk/etc/system/default/props.conf                 DEPTH_LIMIT = 1000
/opt/splunk/etc/system/default/props.conf                 HEADER_MODE =
/opt/splunk/etc/system/default/props.conf                 LEARN_MODEL = true
/opt/splunk/etc/system/default/props.conf                 LEARN_SOURCETYPE = true
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf LINE_BREAKER = (\}\,)
/opt/splunk/etc/system/default/props.conf                 LINE_BREAKER_LOOKBEHIND = 100
/opt/splunk/etc/system/default/props.conf                 MATCH_LIMIT = 100000
/opt/splunk/etc/system/default/props.conf                 MAX_DAYS_AGO = 2000
/opt/splunk/etc/system/default/props.conf                 MAX_DAYS_HENCE = 2
/opt/splunk/etc/system/default/props.conf                 MAX_DIFF_SECS_AGO = 3600
/opt/splunk/etc/system/default/props.conf                 MAX_DIFF_SECS_HENCE = 604800
/opt/splunk/etc/system/default/props.conf                 MAX_EVENTS = 256
/opt/splunk/etc/system/default/props.conf                 MAX_TIMESTAMP_LOOKAHEAD = 128
/opt/splunk/etc/system/default/props.conf                 MUST_BREAK_AFTER =
/opt/splunk/etc/system/default/props.conf                 MUST_NOT_BREAK_AFTER =
/opt/splunk/etc/system/default/props.conf                 MUST_NOT_BREAK_BEFORE =
/opt/splunk/etc/system/default/props.conf                 SEGMENTATION = indexing
/opt/splunk/etc/system/default/props.conf                 SEGMENTATION-all = full
/opt/splunk/etc/system/default/props.conf                 SEGMENTATION-inner = inner
/opt/splunk/etc/system/default/props.conf                 SEGMENTATION-outer = outer
/opt/splunk/etc/system/default/props.conf                 SEGMENTATION-raw = none
/opt/splunk/etc/system/default/props.conf                 SEGMENTATION-standard = standard
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf SHOULD_LINEMERGE = false
/opt/splunk/etc/system/default/props.conf                 TRANSFORMS =
/opt/splunk/etc/system/default/props.conf                 TRUNCATE = 10000
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf category = Splunk App Add-on Builder
/opt/splunk/etc/system/default/props.conf                 detect_trailing_nulls = false
/opt/splunk/etc/system/default/props.conf                 maxDist = 100
/opt/splunk/etc/system/default/props.conf                 priority =
/opt/splunk/etc/apps/TA-testdata_api/local/props.conf pulldown_type = 1
/opt/splunk/etc/system/default/props.conf                 sourcetype =

I need to know, if there is any precedence of LINE_BREAKER over SEDCMD? or what causes the LINE_BREAKER to fail in my case?

0 Karma
1 Solution

harsmarvania57
Ultra Champion

Hi,

Please try below configuration on Heavy Forwarder (Not on Indexer). If you look at https://wiki.splunk.com/Community:HowIndexingWorks, LINE_BREAKER applies in parsing queue and then it goes to aggregation & typing queue (In typing queue SEDCMD will apply to the data), so first you need to adjust LINE_BREAKER to break the events and then you need to apply SEDCMD to modify the data

props.conf (In below config LINE_BREAKER & SEDCMD regex is based on sample data you have provided, if sample data varies from actual raw data like space, new line then below regex will not work and you need to adjust it)

[yourSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=\s\}(\,\s\n\s)
SEDCMD-a=s/\{\s\n\s\"Report_Entry\"\:\s\s\n\s\[//g
SEDCMD-b=s/\s\}\]\s\n//g

View solution in original post

harsmarvania57
Ultra Champion

Hi,

Please try below configuration on Heavy Forwarder (Not on Indexer). If you look at https://wiki.splunk.com/Community:HowIndexingWorks, LINE_BREAKER applies in parsing queue and then it goes to aggregation & typing queue (In typing queue SEDCMD will apply to the data), so first you need to adjust LINE_BREAKER to break the events and then you need to apply SEDCMD to modify the data

props.conf (In below config LINE_BREAKER & SEDCMD regex is based on sample data you have provided, if sample data varies from actual raw data like space, new line then below regex will not work and you need to adjust it)

[yourSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=\s\}(\,\s\n\s)
SEDCMD-a=s/\{\s\n\s\"Report_Entry\"\:\s\s\n\s\[//g
SEDCMD-b=s/\s\}\]\s\n//g

ashutosh2020
Explorer

Thanks harsmarvania57,

I have tried all those combinations of regex, all the regex match perfectly to the log text. Sadly, it does not break the line. I used LINE_BREAKER to break at every "," or "}" just to test the functionality, and it does not work either. This clarifies, there must be some other configuration, I missed or something else overriding.

do you have any other checks? There is no warning or error in the internal logs.

I will have more than 70,000 records in one event, will this make any impact?

0 Karma

harsmarvania57
Ultra Champion

When you use LINE_BREAKER, first capturing group will be removed from your raw data so in above config which I have provided (\,\s\n\s) command-space-newline-space will be removed from your event. Additionally when you use LINE_BREAKER, you need to use SHOULD_LINEMERGE = false.

There might be possibility, you might be hitting TRUNCATE limit. But I'll suggest to test in test environment with small amount of records and once LINE_BREAKER and SEDCMD works then go with actual raw data.

P.S: Config which I have provided has been tested in my lab environment and it was working fine with sample raw data.

0 Karma

ashutosh2020
Explorer

Reducing the number of events is not possible. I have created a file input with the lesser number of records to test. This method works in single instance splunk enterprise but fails in HF--->Indexer scenario.

SHOULD_LINEMERGE is false and removed. And I have changed your (\,\s\n\s) to (\,\s) which suits my data.

I will be looking at the TRUNCATE limits.

0 Karma

ashutosh2020
Explorer

Dear @harsmarvania57

your suggestions have helped a lot.

I am able to resolve the issue. I have completely rebuilt the add-on with fresh configurations. There was something overriding my previous configurations (or might be some bug), but now it is working perfect.

Thanks for your timely help.

0 Karma

harsmarvania57
Ultra Champion

Glad that it worked. You can accept my answer/upvote which helped you.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...