Getting Data In

Regex for Line Break props.conf

nathanluke86
Communicator

Hello Splunkers,

Any Regex geniuses that can help line break the below logs.
alt text

Ideally remove the text in the red and line break where highlighted yellow

TIA

0 Karma
1 Solution

oscar84x
Contributor

Is it possible for you to provide an actual sample of the data? Delete or replace any user data.
It's difficult to figure out without knowing where there are blank spaces or carriage returns.

The settings you're looking to use in props are LINE_BREAKER and SEDCMD. Something like:

LINE_BRAKER = ([{}\,\s]+)"allowed" <-- this would start each event with "allowed" and get rid of the characters between ()
SEDCMD-null = s/{|}|"netflows":\s+[//g <-- This will get rid of the header line as well as any lingering single curly braces

You can play around with the REGEX and those settings and find what works for your desired outcome. But you could share some actual data structure and we can refine it.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Try these props.conf settings.

[mysourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\s+{
SEDCMD-netflows = s/{\s+"netflows": \[//

P.S. Posting text instead of an image makes it easier for us to test regular expressions with your data.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

nathanluke86
Communicator

TIA will try both suggestions. I have added txt above

0 Karma

oscar84x
Contributor

Is it possible for you to provide an actual sample of the data? Delete or replace any user data.
It's difficult to figure out without knowing where there are blank spaces or carriage returns.

The settings you're looking to use in props are LINE_BREAKER and SEDCMD. Something like:

LINE_BRAKER = ([{}\,\s]+)"allowed" <-- this would start each event with "allowed" and get rid of the characters between ()
SEDCMD-null = s/{|}|"netflows":\s+[//g <-- This will get rid of the header line as well as any lingering single curly braces

You can play around with the REGEX and those settings and find what works for your desired outcome. But you could share some actual data structure and we can refine it.

View solution in original post

nathanluke86
Communicator

{
"netflows": [
{
"allowed_domain": [
"xxxxxxxxxxxx"
],
"create_time": "2020-01-28T14:35:01.919766",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:14.431033",
"endpoint_platform": "xxxxx",
"event_hostname": "xxxxxxx",
"id": {
"fragment_id": 7456039343514739067,
"host_id": "xxxxxxxxxxx",
"instance_id": "xxxxxxxxxxxxx",
"timestamp": "2020-01-28T15:29:50.785488"
},
"local_ip": "xxxxxxx",
"local_port": xxxxx,
"process_id": {
"host_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
"pid": 1748,
"time_window": 13224155566
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "xxxxxxxx",
"remote_port": xxxx,
"rx_bytes": 44,
"unique_timestamp": "2020-01-28T15:29:50.785488-f5ab3ba0c8c13db2"
},
{
"allowed_domain": [
"28fea4ba"
],
"create_time": "2020-01-28T14:34:57.648822",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:11.299711",
"endpoint_platform": "xxxxxxxxx",
"event_hostname": "xxxx",
"id": {
"fragment_id": xxxxxxxxxxxxxxx,
"host_id": "xxxxxxxxxxxxxxxxxxxxxx",
"instance_id": "xxxxxxxxxxx",
"timestamp": "2020-01-28T15:29:50.785295"
},
"local_ip": "xxxxxxxxxx",
"local_port": xxxxxxx,
"process_id": {
"host_id": "xxxxxxxxxxxxxxxxxxxxxx",
"pid": xxxxx0,
"time_window": 13224155688
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "xxxxxxxxxx",
"remote_port": xxxxxxxxxxxxxxx,
"rx_bytes": 696,
"unique_timestamp": "2020-01-28T15:29:50.x"
},
{
"allowed_domain": [
"x"
],
"create_time": "2020-01-28T14:34:59.348932",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:10.980602",
"endpoint_platform": "x,
"event_hostname": "x",
"id": {
"fragment_id": x,
"host_id": "2xxxa1",
"instance_id": "dxxxx",
"timestamp": "2020-01-28T15:29:50.783380"
},
"local_ip": "fx3",
"local_port": x,
"process_id": {
"host_id": "x",
"pid": 1x8,
"time_window": 13224155566
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "fxxxxxxx",
"remote_port": x,
"rx_bytes": 44,
"unique_timestamp": "2020-01-28T15:29:50.783380-997aae15b7991f4a"
},

0 Karma

nathanluke86
Communicator

TIA much appreciated

0 Karma

oscar84x
Contributor

Great. From what you shared I got 3 events starting with "allowed_domain" and ending with "unique_timestamp". Also got rid of the header. Try this:

[your_sourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([{}\,\s]+)"allowed
NO_BINARY_CHECK=true
SEDCMD-null=s/^{\s+|"netflows": [//g

0 Karma

oscar84x
Contributor

Any luck?

0 Karma

nathanluke86
Communicator

Thanks @oscar84x

I am having issues with the Splunk Addon builder app doing a rest API call. When I solve this issue I can test this properly and let you know.

Thanks for the help so far.

0 Karma

oscar84x
Contributor

No problem. If it works please don't forget to accept it as an answer, thank you.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!