Getting Data In

In order to reduce license usage, can I remove the time from _raw and keep _time intact?

Contributor

Hello,

In order to reduce Splunk Licence, I am considering to remove the timestamp from _raw but only after the timestamp has been parsed and written into _time.

  • Will this reduce the licence?
  • How can I make sure to make this in the correct order?

Thank you.

0 Karma
1 Solution

Legend

Hi ctaf,
I don't know how long are your events (maybe you have events less long than timestamp), but every way a timestamp is few bytes and I don't think that you can reduce much your license consuption!
Every way, Splunk reads timestamps from the event at index time so you cannot cat timestamp before indexing because in this way you haven't timestamp, and after indexing you don't reduce license consuption.
If you want to do this in every way, you have to parse events with a script before indexing and use file or current time as timestamp.
Bye.
Giuseppe

View solution in original post

Legend

Hi ctaf,
I don't know how long are your events (maybe you have events less long than timestamp), but every way a timestamp is few bytes and I don't think that you can reduce much your license consuption!
Every way, Splunk reads timestamps from the event at index time so you cannot cat timestamp before indexing because in this way you haven't timestamp, and after indexing you don't reduce license consuption.
If you want to do this in every way, you have to parse events with a script before indexing and use file or current time as timestamp.
Bye.
Giuseppe

View solution in original post

Super Champion

The only way this would work is if the event timestamp can be shortened with SEDCMD thereby reducing the number of characters indexed, but it may be impossible to write such a SEDCMD. I have tested the SEDCMD to show that it can reduce license usage by replacing long strings with short ones.
Another option that might work, but I have not tested, is to use SEDCMD to remove the event timestamp entirely and use CURRENT as the timestamp. In this case, Splunk would use the current time as _time, but like I said I don't know how that would affect license usage.

0 Karma

Contributor

I have tried with a TRANSFORM stanza:

REGEX = (?m)^(timestamp)(.*)$
FORMAT = $2
DEST_KEY = _raw

And it works: I have removed the timestamp while Splunk did keep it in _time. But I am not sure it affected the license usage.

0 Karma

Splunk Employee
Splunk Employee

Yes, indexing volume is checked after transforms happens. If _raw data was truncated and smaller than original event, the indexed volume will be smaller than original event, and license usage is smaller.

As a good practice, we do not recommend to recommend to remove timestamp from original event, when timestamp parsing is not correct for some reason, it would be very difficult to identify issue. As @cusello is saying, the amount of data you are saving might be important for some analysis. Saving money and disk usage by removing important information in event would not be a great idea.

Contributor

Looking at http://docs.splunk.com/Documentation/Splunk/6.5.0/Deploy/Datapipeline, "setting timestamps" and "Transforming event data" take place during "Parsing", so it looks doable to me to reduce the license by applying a TRANSFORM Stanza.

0 Karma

Legend

Hi ctaf,
I cannot find the page you describe.
Every way you can reduce license consumption deleting some events not interesting for you (send events to nullqueue) before indexing them.
You could delete a part of your events limiting the length of your events, or delete someting using SEDCMD, but your reduction is very low.
Bye.
Giuseppe

0 Karma