Getting Data In

Regex for Line Break props.conf

nathanluke86
Communicator

Hello Splunkers,

Any Regex geniuses that can help line break the below logs.
alt text

Ideally remove the text in the red and line break where highlighted yellow

TIA

0 Karma
1 Solution

oscar84x
Contributor

Is it possible for you to provide an actual sample of the data? Delete or replace any user data.
It's difficult to figure out without knowing where there are blank spaces or carriage returns.

The settings you're looking to use in props are LINE_BREAKER and SEDCMD. Something like:

LINE_BRAKER = ([{}\,\s]+)"allowed" <-- this would start each event with "allowed" and get rid of the characters between ()
SEDCMD-null = s/{|}|"netflows":\s+[//g <-- This will get rid of the header line as well as any lingering single curly braces

You can play around with the REGEX and those settings and find what works for your desired outcome. But you could share some actual data structure and we can refine it.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Try these props.conf settings.

[mysourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\s+{
SEDCMD-netflows = s/{\s+"netflows": \[//

P.S. Posting text instead of an image makes it easier for us to test regular expressions with your data.

---
If this reply helps you, Karma would be appreciated.
0 Karma

nathanluke86
Communicator

TIA will try both suggestions. I have added txt above

0 Karma

oscar84x
Contributor

Is it possible for you to provide an actual sample of the data? Delete or replace any user data.
It's difficult to figure out without knowing where there are blank spaces or carriage returns.

The settings you're looking to use in props are LINE_BREAKER and SEDCMD. Something like:

LINE_BRAKER = ([{}\,\s]+)"allowed" <-- this would start each event with "allowed" and get rid of the characters between ()
SEDCMD-null = s/{|}|"netflows":\s+[//g <-- This will get rid of the header line as well as any lingering single curly braces

You can play around with the REGEX and those settings and find what works for your desired outcome. But you could share some actual data structure and we can refine it.

nathanluke86
Communicator

{
"netflows": [
{
"allowed_domain": [
"xxxxxxxxxxxx"
],
"create_time": "2020-01-28T14:35:01.919766",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:14.431033",
"endpoint_platform": "xxxxx",
"event_hostname": "xxxxxxx",
"id": {
"fragment_id": 7456039343514739067,
"host_id": "xxxxxxxxxxx",
"instance_id": "xxxxxxxxxxxxx",
"timestamp": "2020-01-28T15:29:50.785488"
},
"local_ip": "xxxxxxx",
"local_port": xxxxx,
"process_id": {
"host_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
"pid": 1748,
"time_window": 13224155566
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "xxxxxxxx",
"remote_port": xxxx,
"rx_bytes": 44,
"unique_timestamp": "2020-01-28T15:29:50.785488-f5ab3ba0c8c13db2"
},
{
"allowed_domain": [
"28fea4ba"
],
"create_time": "2020-01-28T14:34:57.648822",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:11.299711",
"endpoint_platform": "xxxxxxxxx",
"event_hostname": "xxxx",
"id": {
"fragment_id": xxxxxxxxxxxxxxx,
"host_id": "xxxxxxxxxxxxxxxxxxxxxx",
"instance_id": "xxxxxxxxxxx",
"timestamp": "2020-01-28T15:29:50.785295"
},
"local_ip": "xxxxxxxxxx",
"local_port": xxxxxxx,
"process_id": {
"host_id": "xxxxxxxxxxxxxxxxxxxxxx",
"pid": xxxxx0,
"time_window": 13224155688
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "xxxxxxxxxx",
"remote_port": xxxxxxxxxxxxxxx,
"rx_bytes": 696,
"unique_timestamp": "2020-01-28T15:29:50.x"
},
{
"allowed_domain": [
"x"
],
"create_time": "2020-01-28T14:34:59.348932",
"direction": "DIRECTION_REMOTE_INITIATED",
"end_time": "2020-01-28T14:42:10.980602",
"endpoint_platform": "x,
"event_hostname": "x",
"id": {
"fragment_id": x,
"host_id": "2xxxa1",
"instance_id": "dxxxx",
"timestamp": "2020-01-28T15:29:50.783380"
},
"local_ip": "fx3",
"local_port": x,
"process_id": {
"host_id": "x",
"pid": 1x8,
"time_window": 13224155566
},
"protocol": "PROTOCOL_UDP",
"remote_ip": "fxxxxxxx",
"remote_port": x,
"rx_bytes": 44,
"unique_timestamp": "2020-01-28T15:29:50.783380-997aae15b7991f4a"
},

0 Karma

nathanluke86
Communicator

TIA much appreciated

0 Karma

oscar84x
Contributor

Great. From what you shared I got 3 events starting with "allowed_domain" and ending with "unique_timestamp". Also got rid of the header. Try this:

[your_sourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([{}\,\s]+)"allowed
NO_BINARY_CHECK=true
SEDCMD-null=s/^{\s+|"netflows": [//g

0 Karma

oscar84x
Contributor

Any luck?

0 Karma

nathanluke86
Communicator

Thanks @oscar84x

I am having issues with the Splunk Addon builder app doing a rest API call. When I solve this issue I can test this properly and let you know.

Thanks for the help so far.

0 Karma

oscar84x
Contributor

No problem. If it works please don't forget to accept it as an answer, thank you.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...