Hello,
Line breaker in my props configuration for the json formatted file is not working, it's not breaking the json events. My props and sample json events are giving below. Any recommendation will be highly appreciated, thank you!
props
[myprops]
CHARSET=UTF-8
KV_MODE-json
LINE_BREAKER=([\r\n]+)\"auditId\"\:
SHOULD_LINEMERGE=true
TIME_PREFIX="audittime": "
TIME_FORMAT=%Y-%m-%dT%H:%M:%S
TRUNCATE=9999
Sample Events
{
"items": [
{
"auditId" : 15067,
"secId": "mtt01",
"audittime": "2016-07-31T12:24:37Z",
"links": [
{
"name":"conanicaldba",
"href": "https://it.for.dev.com/opa-api"
},
{
"name":"describedbydba",
"href": "https://it.for.dev.com/opa-api/meta-data"
}
]
},
{
"auditId" : 16007,
"secId": "mtt01",
"audittime": "2016-07-31T12:23:47Z",
"links": [
{
"name":"conanicaldba",
"href": "https://it.for.dev.com/opa-api"
},
{
"name":"describedbydba",
"href": "https://it.for.dev.com/opa-api/meta-data"
}
]
},
{
"auditId" : 15165,
"secId": "mtt01",
"audittime": "2016-07-31T12:22:51Z",
"links": [
{
"name":"conanicaldba",
"href": "https://it.for.dev.com/opa-api"
},
{
"name":"describedbydba",
"href": "https://it.for.dev.com/opa-api/meta-data"
}
]
}
]
I recommend using the website https://regex101.com/ to test your regex and ensure it is definitely matching. When your regex is inserted, it does not seem to match the space character between "auditId" and the following colon (:)
I would also recommend splitting the json events so that they have the curly brackets like so:
{
"event1keys" : "event1values",
....
}
{
"event2keys" : "event2values",
....
}
Thus your LINE_BREAKER value should also match the opening curly brace and its newline, and its first capture group should include the discardable characters between events such as commas
LINE_BREAKER=(,?[\r\n]+){\s*\"auditId\"
I also recommend setting SHOULD_LINEMERGE to false to prevent Splunk from re-assembling multi-line events after the split.
Testing this sample file on my local I think something like this could work.
[ <SOURCETYPE NAME> ]
...
LINE_BREAKER=([\r\n]+)\s*\{\s*[\r\n]+\s*\"auditId\"
TIME_FORMAT=%Y-%m-%dT%H:%M:%S
TIME_PREFIX=(?:.*[\r\n]+)*\"audittime\":\s*\"
SEDCMD-remove_trailing_comma=s/\,$//g
SEDCMD-remove_trailing_bracket=s/\][\r\n]+$//g
TRANSFORMS-remove_header=remove_json_header
This is a parsed event from the sampled file.
I am getting a warning about the timestamp, but this is not because it is unable to find it but because the datetime exceeds my set limit for MAX_DAYS_AGO/MAX_DAYS_HENCE.
Note the transform included in the props,
This is needed to remove the first part of the json file that the events are nested in.
There will need to be an accompanying stanza in transforms.conf specifying regex used to regognize the event to send to null queue. It probably would look something like this.
[remove_json_header]
REGEX = ^\s*\{\s*[\r\n]+\"items\":\s*\[
DEST_KEY = queue
FORMAT = nullQueue
I recommend using the website https://regex101.com/ to test your regex and ensure it is definitely matching. When your regex is inserted, it does not seem to match the space character between "auditId" and the following colon (:)
I would also recommend splitting the json events so that they have the curly brackets like so:
{
"event1keys" : "event1values",
....
}
{
"event2keys" : "event2values",
....
}
Thus your LINE_BREAKER value should also match the opening curly brace and its newline, and its first capture group should include the discardable characters between events such as commas
LINE_BREAKER=(,?[\r\n]+){\s*\"auditId\"
I also recommend setting SHOULD_LINEMERGE to false to prevent Splunk from re-assembling multi-line events after the split.