Getting Data In

Truncate events in props.conf to reduce the license cost

Contributor

Hello,

I have the following stanza in my props.conf for the relevant sourcetype:

[BWPhanatraces]
TRUNCATE = 0
TRANSFORMS-BWP
parameterChangelog_clone
TRANSFORMS-eliminatedebug = setnull

Now, as the corresponding logs are the database logs, very often there are full SQL statements texts coming, they can be really long.
What I would like to achieve is to set the upper limit for a single event to e.g. 50 lines and maximal 5.000 characters, whatever is reached first. Also, this should not be only a splitting criteria - the rest of the event should be scrapped.

My questions:
- how would I do this and is it possible to get it only for my sourcetype BWP_hanatraces?
- Would it lower the license costs? Shortly speaking would the event truncation happen before the Splunk license costs get calculated?

Kind regards,
Kamil

0 Karma
1 Solution

Ultra Champion

Truncate will not split, it will do just what it says: truncate. But it works with bytes (which roughly aligns with characters typically), not lines. So you could use truncate for the 5000 char limit.

For the line limit, you could devise some kind of SEDCMD that strips off anything after 50 newlines. So together, put this in props.conf:

[BWP_hanatraces]
TRUNCATE = 5000
SEDCMD-truncate = s/((?:[^\r\n]*[\r\n]+){50}).*/\1/

Alternatively, you could also see if you can come up with a SEDCMD that in general strips out the whole query, but perhaps that is not what you want?

View solution in original post

0 Karma

Ultra Champion

Truncate will not split, it will do just what it says: truncate. But it works with bytes (which roughly aligns with characters typically), not lines. So you could use truncate for the 5000 char limit.

For the line limit, you could devise some kind of SEDCMD that strips off anything after 50 newlines. So together, put this in props.conf:

[BWP_hanatraces]
TRUNCATE = 5000
SEDCMD-truncate = s/((?:[^\r\n]*[\r\n]+){50}).*/\1/

Alternatively, you could also see if you can come up with a SEDCMD that in general strips out the whole query, but perhaps that is not what you want?

View solution in original post

0 Karma

Path Finder

Ref: https://docs.splunk.com/Documentation/Splunk/7.2.6/Admin/HowSplunklicensingworks

How data is metered
For event data, data volume is based on the amount of raw external data that the indexer ingests into its indexing pipeline, after any filtering. It is not based on the amount of compressed data that gets written to disk. For metrics data, each metric event counts as a fixed 150 bytes. Metrics data does not use a separate license. Rather, it draws from the same license quota as event data.

The key above is the "after any filtering". The TRUNCATE and TRANSFORMS operations occur within the Parsing pipeline which is before the Indexing pipeline (https://docs.splunk.com/Documentation/Splunk/7.2.6/Indexer/Howindexingworks)

0 Karma

Contributor

Hello @chris_barrett

Thank you.
And how would I restrict the number of lines to max 50 and number of characters to max 5000 per event?
As far as I understand the TRUNCATE will just split the long events into several smaller - this is not what I want. I would like to reduce the amount of data per event, just skipping everything what is longer than above limits.

Kind Regards,
Kamil

0 Karma

Ultra Champion

@damucka -

-- As far as I understand the TRUNCATE will just split the long events into several smaller

Not really.

The following says [props.conf.spec][1]

[1]: https://docs.splunk.com/Documentation/Splunk/7.2.6/Admin/Propsconf says

TRUNCATE =
* Change the default maximum line length (in bytes).
* Although this is in bytes, line length is rounded down when this would
otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
a sign of garbage data).
* Defaults to 10000 bytes.

With the LINE's set-up, it ends up to be the event's total length and it's very common for your case and Java exceptions to use TRUNCATE in order to trim the event.

0 Karma

Contributor

Thank you.

I tested the following setup in my props.conf:

[(?::){0}*hanatraces]
TRUNCATE = 1000
MAXEVENTS = 50
TRANSFORMS-BWP
parameterChangelog_clone
TRANSFORMS-eliminatedebug = debugsetnull
TRANSFORMS-LogReplayCoordinator = LogReplaysetnull
TRANSFORMS-anon = anonymize-ip, anonymize-user

Now, I have following issue:
I noticed that the lines get truncated after 1000 characters, that is fine, but the event is not truncated after 50 lines but splited. As my target is to reduce the license, this does not help me - I would like the big events to have maximum 50 lines, each max 1000 lines. the rest of the event should be trashed, not splited.

Could you please advise how I would achieve this?

Kind Regards,
Kamil

0 Karma

Path Finder

I don't have a system here at home that I can test with but I believe that the following will do what you're after:
SHOULDLINEMERGE = true
MAX
EVENTS = 50
TRUNCATE = 5000

You will however need to provide your own event breaking using BREAKONLYBEFORE, MUSTBREAKAFTER or similar.

0 Karma