Getting Data In

using SEDCMD to remove repeated lines

twistedsixty4
Path Finder

hey everyone,
Our server here generates a filestamp/header at midnight or on resets that start with a line of dashes(----), then enters a line of system and log information (this is useful), then ends with another line of dashes (----). I've been trying to use SEDCMD to delete the dashed lines (including the carriage return) but it doesnt seem to be working.

SEDCMD-StripBreaks = s/\n?----*//g

I've tried changing the name for the class, tried removing the "\n", double escaping the "\n", all of it just doesn't work. One thing that works is if I try it inline..

| rex mode=sed "s/\n?----*//g"

I could use some help getting my head around this..


Update:

so I've tried to do this with a different issue I'm having which is repeated spaces, if I use the rex sed mode it works just fine, but the second i add it to props it falls apart. here's the second one I've tried..

SEDCMD-SubSpaces = s/\s\s+/ /g

beside this i've tried changing how the SED is applied by giving it a source file instead of the sourcetype and it still doesnt work. here is the working rex for my second SEDCMD.

rex mode=sed "s/\s\s+/ /g"

again, any help is appreciated!

0 Karma
1 Solution

Lowell
Super Champion

I'm assuming the content you want to strip looks something like this? And that you want to remove the solid lines and keep the line in the middle.

------------------------------------------------
some other log file ...
------------------------------------------------

Then something like this should work:

[your-source-type]
SEDCMD-StripBreaks = s/----+[\r\n]*//g

This will handle different EOL combinations (CR/CRLF/LF) and instead of consuming the leading EOL, it will remove the end. (Unless you have lines with random trailing dashes, that you would like to keep, this should work fine.)

Just to be clear, this is index-time setting which means (1) existing events already indexed will NOT be updated, and (2) you must restart Splunk for this setting to take effect.

View solution in original post

Lowell
Super Champion

I'm assuming the content you want to strip looks something like this? And that you want to remove the solid lines and keep the line in the middle.

------------------------------------------------
some other log file ...
------------------------------------------------

Then something like this should work:

[your-source-type]
SEDCMD-StripBreaks = s/----+[\r\n]*//g

This will handle different EOL combinations (CR/CRLF/LF) and instead of consuming the leading EOL, it will remove the end. (Unless you have lines with random trailing dashes, that you would like to keep, this should work fine.)

Just to be clear, this is index-time setting which means (1) existing events already indexed will NOT be updated, and (2) you must restart Splunk for this setting to take effect.

maria1991
Explorer

What if I need to remove the line in between the ---- lines as well?

Please suggest @Lowell and @dshpritz 

0 Karma

dshpritz
SplunkTrust
SplunkTrust

Can you provide an example?

0 Karma

splunKR1
Loves-to-Learn

Hi @dshpritz 

-----------------------------------------------------------------

some text 

-----------------------------------------------------------------

These lines won't have any timestamps, need to complete ignore them from indexing.

0 Karma

maria1991
Explorer

-----------------------------------------------------------------

some text 

-----------------------------------------------------------------

These lines won't have any timestamps, need to complete ignore them from indexing.

0 Karma

dshpritz
SplunkTrust
SplunkTrust

I think something like this would work:

SEDCMD-turnthismotherout = s/(?:^-+[\n\r$])(?:.+?[\r\n])(?:^-+[\n\r$])//g
0 Karma

twistedsixty4
Path Finder

this worked! but your points at the end helped the most, I was stopping and restarting my server, but wasn't cleaning it, now that i think about it it makes a lot of sense to do that. thanks for your help!

0 Karma

dshpritz
SplunkTrust
SplunkTrust

From a first look, the problem may be the props.conf syntax.

You have:

SED-StripBreaks = s/\n?----*//g

You may want:

SEDCMD-StripBreaks = s/\n?----*//g

Additionally, if the headers are showing up as individual events, you may want to look into using a null queue routing:

Route and filter data

twistedsixty4
Path Finder

no, since they have no timestamp splunk sees them as part of the log before it, which is fine.

0 Karma

dshpritz
SplunkTrust
SplunkTrust

Are these header lines showing up as individual events in Splunk?

0 Karma

twistedsixty4
Path Finder

its one line of dashes, maybe 30 or 40 dashes long, and its stamped throughout the log because it points to another file. the whole line is nothing but dashes, then a path to a file, then another line of dashes.

0 Karma

dshpritz
SplunkTrust
SplunkTrust

Can you provide an example of what the header like looks like?

0 Karma

twistedsixty4
Path Finder

sorry that was a problem with my syntax copying over, i did have SEDCMD there, im working off an airgapped network so i cant copy/paste. i updated with new information.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...