Getting Data In

Regex for Line Break props.conf

nathanluke86
Communicator

Hello Splunkers,

Any Regex geniuses that can help line break the below logs.
alt text

Ideally remove the text in the red and line break where highlighted yellow

TIA

0 Karma
1 Solution

oscar84x
Contributor

Is it possible for you to provide an actual sample of the data? Delete or replace any user data.
It's difficult to figure out without knowing where there are blank spaces or carriage returns.

The settings you're looking to use in props are LINE_BREAKER and SEDCMD. Something like:

LINE_BRAKER = ([{}\,\s]+)"allowed" <-- this would start each event with "allowed" and get rid of the characters between ()
SEDCMD-null = s/{|}|"netflows":\s+[//g <-- This will get rid of the header line as well as any lingering single curly braces

You can play around with the REGEX and those settings and find what works for your desired outcome. But you could share some actual data structure and we can refine it.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Try these props.conf settings.

[mysourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\s+{
SEDCMD-netflows = s/{\s+"netflows": \[//

P.S. Posting text instead of an image makes it easier for us to test regular expressions with your data.

---
If this reply helps you, Karma would be appreciated.
0 Karma

nathanluke86
Communicator

TIA will try both suggestions. I have added txt above

0 Karma

oscar84x
Contributor

Is it possible for you to provide an actual sample of the data? Delete or replace any user data.
It's difficult to figure out without knowing where there are blank spaces or carriage returns.

The settings you're looking to use in props are LINE_BREAKER and SEDCMD. Something like:

LINE_BRAKER = ([{}\,\s]+)"allowed" <-- this would start each event with "allowed" and get rid of the characters between ()
SEDCMD-null = s/{|}|"netflows":\s+[//g <-- This will get rid of the header line as well as any lingering single curly braces

You can play around with the REGEX and those settings and find what works for your desired outcome. But you could share some actual data structure and we can refine it.

nathanluke86
Communicator

{
"netflows": [
{
"allowed_domain": [
"xxxxxxxxxxxx"
],
"create_time": "2020-01-28T14:35:01.919766",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:14.431033",
"endpoint_platform": "xxxxx",
"event_hostname": "xxxxxxx",
"id": {
"fragment_id": 7456039343514739067,
"host_id": "xxxxxxxxxxx",
"instance_id": "xxxxxxxxxxxxx",
"timestamp": "2020-01-28T15:29:50.785488"
},
"local_ip": "xxxxxxx",
"local_port": xxxxx,
"process_id": {
"host_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
"pid": 1748,
"time_window": 13224155566
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "xxxxxxxx",
"remote_port": xxxx,
"rx_bytes": 44,
"unique_timestamp": "2020-01-28T15:29:50.785488-f5ab3ba0c8c13db2"
},
{
"allowed_domain": [
"28fea4ba"
],
"create_time": "2020-01-28T14:34:57.648822",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:11.299711",
"endpoint_platform": "xxxxxxxxx",
"event_hostname": "xxxx",
"id": {
"fragment_id": xxxxxxxxxxxxxxx,
"host_id": "xxxxxxxxxxxxxxxxxxxxxx",
"instance_id": "xxxxxxxxxxx",
"timestamp": "2020-01-28T15:29:50.785295"
},
"local_ip": "xxxxxxxxxx",
"local_port": xxxxxxx,
"process_id": {
"host_id": "xxxxxxxxxxxxxxxxxxxxxx",
"pid": xxxxx0,
"time_window": 13224155688
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "xxxxxxxxxx",
"remote_port": xxxxxxxxxxxxxxx,
"rx_bytes": 696,
"unique_timestamp": "2020-01-28T15:29:50.x"
},
{
"allowed_domain": [
"x"
],
"create_time": "2020-01-28T14:34:59.348932",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:10.980602",
"endpoint_platform": "x,
"event_hostname": "x",
"id": {
"fragment_id": x,
"host_id": "2xxxa1",
"instance_id": "dxxxx",
"timestamp": "2020-01-28T15:29:50.783380"
},
"local_ip": "fx3",
"local_port": x,
"process_id": {
"host_id": "x",
"pid": 1x8,
"time_window": 13224155566
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "fxxxxxxx",
"remote_port": x,
"rx_bytes": 44,
"unique_timestamp": "2020-01-28T15:29:50.783380-997aae15b7991f4a"
},

0 Karma

nathanluke86
Communicator

TIA much appreciated

0 Karma

oscar84x
Contributor

Great. From what you shared I got 3 events starting with "allowed_domain" and ending with "unique_timestamp". Also got rid of the header. Try this:

[your_sourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([{}\,\s]+)"allowed
NO_BINARY_CHECK=true
SEDCMD-null=s/^{\s+|"netflows": [//g

0 Karma

oscar84x
Contributor

Any luck?

0 Karma

nathanluke86
Communicator

Thanks @oscar84x

I am having issues with the Splunk Addon builder app doing a rest API call. When I solve this issue I can test this properly and let you know.

Thanks for the help so far.

0 Karma

oscar84x
Contributor

No problem. If it works please don't forget to accept it as an answer, thank you.

0 Karma
Get Updates on the Splunk Community!

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

There’s something special about this time of year—maybe it’s the glow of the holidays, maybe it’s the ...

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to ...

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to Officially Supported Splunk ...

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI! Discover how Splunk’s agentic AI ...