Getting Data In

How can I remove a partial string of a single line event and keep the rest by transforms.conf?

Splunk Employee
Splunk Employee

How can I remove partial string of single line event and keep the rest by transforms.conf?
(Note: Originally I mistakenly said keep only 6k bytes. Sorry for the confusion)

I have syslog type of data. They are single line and sometimes more than 64k Byte long.
I do not need first Timestamp and host strings because that part was added by a syslog server.
I would like to keep the rest.
So, I created the following transforms.conf, but it does not work.
I know SEDCMD works to do the same job.
But, why does transforms.conf not work?

  • props.conf

    [syslog-cef]
    SHOULD_LINEMERGE = false
    TRANSFORMS-keep6k = removeHeader_keeprest

  • transforms.conf

    [removeHeader_keeprest]
    REGEX = ^\w{3}\s+\d{1,2}\s+(?:\d{2}:){2}:\d{2}\s+[\w.]+\s(.+)
    DEST_KEY = _raw
    FORMAT = $1
    Only 4052 bytes of an event was indexed.

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi Masa,

Did you check the LOOKAHEAD = option in transforms.conf ?
From the docs http://docs.splunk.com/Documentation/Splunk/6.2.5/Admin/Transformsconf :

LOOKAHEAD = <integer>
* NOTE: This option is valid for all index time transforms, such as index-time
  field creation, or DEST_KEY modifications.
* Optional. Specifies how many characters to search into an event.
* Defaults to 4096. You may want to increase this value if you have event line lengths that 
  exceed 4096 characters (before linebreaking).

cheers, MuS

PS: Thanks for this amazing wiki http://wiki.splunk.com/Community:Test:How_Splunk_behaves_when_receiving_or_forwarding_udp_data !

View solution in original post

SplunkTrust
SplunkTrust

Hi Masa,

Did you check the LOOKAHEAD = option in transforms.conf ?
From the docs http://docs.splunk.com/Documentation/Splunk/6.2.5/Admin/Transformsconf :

LOOKAHEAD = <integer>
* NOTE: This option is valid for all index time transforms, such as index-time
  field creation, or DEST_KEY modifications.
* Optional. Specifies how many characters to search into an event.
* Defaults to 4096. You may want to increase this value if you have event line lengths that 
  exceed 4096 characters (before linebreaking).

cheers, MuS

PS: Thanks for this amazing wiki http://wiki.splunk.com/Community:Test:How_Splunk_behaves_when_receiving_or_forwarding_udp_data !

View solution in original post

Splunk Employee
Splunk Employee

Thanks, MuS !

Yes, that's what we needed! I should have read the spec file...
Because the regex (.+) parsed as many as characters in the event after removing the first part, it ended up with 4052 character length.

Agin, the default capture length is 4096. And my regex removed 44 characters from the beginning of the line. As a result, only 4052 characters of the event was indexed.

This attribute is important when _raw data or field length is longer than 4k.

P.S. Thanks for recognizing the wiki document. Splunk doc team is planning to add refined and concise version to our official doc. The wiki doc is very verbose and will not fit in our official doc in a way 🙂

0 Karma

Esteemed Legend

Your capture group should be (.{0,6144}).

0 Karma

Splunk Employee
Splunk Employee

woodcok. Thank you for pointing it out. It was my mistake. I was not supposed to say to keep only 6k.
Otherwise, yes, you're right about the regex when I want to keep any characters up to 6k.
Sorry for the confusion.
By the way, even if I do the regex you suggested to keep up to 6k, it will not parse 6k when an event is larger than 6k.

0 Karma