Hello, I need help with perfecting a sourcetype that doesn't index my json files correctly when I am defining multiple capture groups within the LINE_BREAKER parameter.
I'm using this other questionto try to figure out how to make it work: https://community.splunk.com/t5/Getting-Data-In/How-to-handle-LINE-BREAKER-regex-for-multiple-captur...
In my case my json looks like this
[{"Field 1": "Value 1", "Field N": "Value N"}, {"Field 1": "Value 1", "Field N": "Value N"}, {"Field 1": "Value 1", "Field N": "Value N"}]
Initially I tried:
LINE_BREAKER = }(,\s){
Which split the events with the exception of the first and last records which were not indexed correctly due to the "[" or "]" characters leading and trailing the payload.
After many attempts I have been unable to make it work, but based on what I've read this seems to be the most intuitive solution for defining the capture groups:
LINE_BREAKER = ^([){|}(,\s){|}(])$
It doesn't work, but rather indexes the entire payload as one event, formatted correctly, but unusable.
Could somebody please suggest how to correctly define the LINE_BREAKER parameter for the sourcetype? Here is the full version I'm using:
[area:prd:json]
SHOULD_LINEMERGE = false
TRUNCATE = 8388608
TIME_PREFIX = \"Updated\sdate\"\:\s\"
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = Europe/Paris
MAX_TIMESTAMP_LOOKAHEAD = -1
KV_MODE = json
LINE_BREAKER = ^([){|}(,\s){|}(])$
Other resolutions to my problem are welcome as well!
Best regards,
Andrew
Dear splunk user,
using this sample data
[{"Field 859": "Value aaaaa", "Field 2": "Value bbbbb"}, {"Field 1": "Value ccccc", "Field 2": "Value ddddd"}, {"Field 1": "Value eeeee", "Field 2": "Value fffff"}]
[{"Field 759:" "Value ggggg", "Field 2": "Value hhhhh"}, {"Field 1": "Value iiiii", "Field 2": "Value jjjjj"}, {"Field 1": "Value kkkkk", "Field 2": "Value lllll"}]with this props.conf
[trbndrw_temp]
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = (?:\}(\s*,\s*)\{)|(\][\r\n]+\[)
TRANSFORMS-getrid = getridhtand this transforms.conf
[getridht]
INGEST_EVAL = _raw=replace(_raw, "(\[|\])","")you may be able to achieve what you want
Happy splunking
Luca (aka "one DASH is always better")
Dear splunk user,
using this sample data
[{"Field 859": "Value aaaaa", "Field 2": "Value bbbbb"}, {"Field 1": "Value ccccc", "Field 2": "Value ddddd"}, {"Field 1": "Value eeeee", "Field 2": "Value fffff"}]
[{"Field 759:" "Value ggggg", "Field 2": "Value hhhhh"}, {"Field 1": "Value iiiii", "Field 2": "Value jjjjj"}, {"Field 1": "Value kkkkk", "Field 2": "Value lllll"}]with this props.conf
[trbndrw_temp]
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = (?:\}(\s*,\s*)\{)|(\][\r\n]+\[)
TRANSFORMS-getrid = getridhtand this transforms.conf
[getridht]
INGEST_EVAL = _raw=replace(_raw, "(\[|\])","")you may be able to achieve what you want
Happy splunking
Luca (aka "one DASH is always better")
Thanks Luca, this works! Appreciated!
 
		
		
		
		
		
	
			
		
		
			
					
		Thank you @isoutamo for the response. Here is more accurate version of payload
[
    {
        "Assigned to": "Jones, Francis",
        "Cost": 3,
        "Created date": "2024-02-28 12:52:18",
        "Extraction date": "2024-03-02 13:51:00",
        "ID": 12345,
        "Initial Cost": 3,
        "Location": "Sites",
        "Path": "Sites\\FY1\\S3",
        "Priority": 1,
        "State": "In Progress",
        "Status Change date": "2024-03-05 16:33:23",
        "Tags": "Europe; Finance",
        "Title": "Ensure correct routing of orders",
        "Updated date": "2024-03-05 16:33:23",
        "Warranty": false,
        "Wave Quarter": "Q2 22",
        "Work Item Type": "Request"
    },
    {
        "Assigned to": "Jones, Francis",
        "Cost": 3,
        "Created date": "2024-02-28 18:59:18",
        "Extraction date": "2024-03-05 16:31:00",
        "ID": 12345,
        "Initial Cost": 3,
        "Location": "Sites",
        "Path": "Sites\\FY1\\S3",
        "Priority": 1,
        "State": "In Progress",
        "Status Change date": "2024-03-05 16:33:23",
        "Tags": "Europe; Finance",
        "Title": "Ensure correct routing of orders",
        "Updated date": "2024-03-05 16:33:23",
        "Warranty": false,
        "Wave Quarter": "Q2 22",
        "Work Item Type": "Request"
    },
    {
        "Assigned to": "Jones, Francis",
        "Cost": 3,
        "Created date": "2023-01-28 18:59:18",
        "Extraction date": "2023-02-05 16:31:00",
        "ID": 12345,
        "Initial Cost": 3,
        "Location": "Sites",
        "Path": "Sites\\FY1\\S3",
        "Priority": 1,
        "State": "In Progress",
        "Status Change date": "2023-02-05 16:33:23",
        "Tags": "Europe; Finance",
        "Title": "Ensure correct routing of orders",
        "Updated date": "2024-03-05 16:33:23",
        "Warranty": false,
        "Wave Quarter": "Q2 22",
        "Work Item Type": "Request"
    }
] 
		
		
		
		
		
	
			
		
		
			
					
		Thanks.
This seems to work
LINE_BREAKER = (\[[\s\n\r]*\{|\},[\s\n\r]+\{|\}[\s\n\r]*)Why your regex doesn't work?
Splunk need only one capture group for line beak. You have three separate groups even you have try to make those selectable by |. You also need to escape some of those marks (like [{]} to recognise as a character). You can test this with https://regex101.com/r/IGQHd7/1
When I test these I use just regex101.com and/or Splunk GUI -> Settings -> Import Data -> Upload with example file on my own laptop/workstation/dev server. In that way it's easy to change those values and check how those are affecting.
You should also change
MAX_TIMESTAMP_LOOKAHEAD = 20As you define TIMESTAMP_PREFIX there is no reason to use -1 as its lookahead value. Splunk starts to look it after defined prefix and as you can see correct timestamp is within 20 character after it.
Why you have set KV_MODE=json? As you have break this json into separate events, it's not anymore json as a format. Now it's just regular text based event.
Thank you for the feedback! I will take your suggestions into consideration!
