<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to transform data prior to indexing in Deployment Architecture</title>
    <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106421#M3964</link>
    <description>&lt;P&gt;Yeah, you'll need something way more powerful than this SEDCMD-based approach.  Part of your problem is at this point in the indexing process, Splunk really doesn't know what is a "number" and what isn't -- it only sees strings of characters.&lt;/P&gt;</description>
    <pubDate>Mon, 02 Apr 2012 13:48:54 GMT</pubDate>
    <dc:creator>dwaddle</dc:creator>
    <dc:date>2012-04-02T13:48:54Z</dc:date>
    <item>
      <title>How to transform data prior to indexing</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106418#M3961</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I am using Splunk to collect perfmon data from my servers as well.  however, the data i am indexing currently is very raw and I believe  its consuming much more space in the index that it really should.  I am compensating for this currently by reducing the collecting frequency, this is less than ideal though as i then lose resolution over time, I'd rather have a slightly less accurate value, more often.&lt;/P&gt;

&lt;P&gt;What I really want to do is transform this data as part of the collection process so that it consumes less space in the indexes eg&lt;/P&gt;

&lt;P&gt;Currently:&lt;BR /&gt;
% Processor Time = 25.1536939345647248&lt;BR /&gt;
Memory MBytes free = 10182&lt;BR /&gt;
% disk Space free = 49.753237234302468&lt;BR /&gt;
NIC RX bytes/in = 690.58168126768078&lt;BR /&gt;
NIC TX bytes/in = 949.90804335349833&lt;/P&gt;

&lt;P&gt;What I would like to do is transform all of these values so that I get say, 6 significant figures only.&lt;/P&gt;

&lt;P&gt;transformed to consume less space:&lt;BR /&gt;
% Processor Time = 25.1536&lt;BR /&gt;
Memory MBytes free = 10182.0&lt;BR /&gt;
% disk Space free = 49.7532&lt;BR /&gt;
NIC RX bytes/in = 690.581&lt;BR /&gt;
NIC TX bytes/in = 949.908&lt;/P&gt;

&lt;P&gt;I use deployment server, so if this transformation could be done as part of the collection, even better.&lt;/P&gt;

&lt;P&gt;Has anyone done anything like this?&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 01:16:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106418#M3961</guid>
      <dc:creator>Conradj</dc:creator>
      <dc:date>2012-04-02T01:16:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to transform data prior to indexing</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106419#M3962</link>
      <description>&lt;P&gt;The traditional approach to transforming just-prior-to-indexing is to use &lt;CODE&gt;SEDCMD&lt;/CODE&gt;.  With the right regular expression, this should work.  However, it may not be entirely pretty and/or mathematically correct.&lt;/P&gt;

&lt;P&gt;(props.conf)&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[mysourcetype]
SEDCMD-foo = s/=(\s+)([0-9.]{7})(\d+)/=\1\2/
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Note I've not tested this regular expression, it may not work properly at all ...&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 02:03:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106419#M3962</guid>
      <dc:creator>dwaddle</dc:creator>
      <dc:date>2012-04-02T02:03:07Z</dc:date>
    </item>
    <item>
      <title>Re: How to transform data prior to indexing</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106420#M3963</link>
      <description>&lt;P&gt;I wondered if a regex might work, (and it does for the cases I have shown above, apart from memory)&lt;/P&gt;

&lt;P&gt;But I think I need it to be a bit smarter and reformat a number into a scientific notation.&lt;/P&gt;

&lt;P&gt;Otherwise an input of say 0.000001234567 will become 0.00000 if I use regex, so it really needs to be shown as 1.23457x10^-6&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 02:45:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106420#M3963</guid>
      <dc:creator>Conradj</dc:creator>
      <dc:date>2012-04-02T02:45:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to transform data prior to indexing</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106421#M3964</link>
      <description>&lt;P&gt;Yeah, you'll need something way more powerful than this SEDCMD-based approach.  Part of your problem is at this point in the indexing process, Splunk really doesn't know what is a "number" and what isn't -- it only sees strings of characters.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 13:48:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106421#M3964</guid>
      <dc:creator>dwaddle</dc:creator>
      <dc:date>2012-04-02T13:48:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to transform data prior to indexing</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106422#M3965</link>
      <description>&lt;P&gt;what is the significance of 1.23457x10^-6 ?? that is a rather small number. if 10E-6 with hundreths at that unit scale is the resolution you need then regex out 8 digits, so 0.00000123&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2012 15:44:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106422#M3965</guid>
      <dc:creator>cvajs</dc:creator>
      <dc:date>2012-04-03T15:44:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to transform data prior to indexing</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106423#M3966</link>
      <description>&lt;P&gt;Its just an example, but I could use a really big number to demonstrate aswell e.g. my NIC reports that the current Tx rate is 3949787710 bytes/s (10 char).  Would you rather see that 10 digit number eating away your index or would you rather see it putting 3.950E9 Bytes/s (7 char) instead?.&lt;/P&gt;

&lt;P&gt;The later consumes 30% less space in your index with only a minimal loss of precision over the original value.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Apr 2012 03:52:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106423#M3966</guid>
      <dc:creator>Conradj</dc:creator>
      <dc:date>2012-04-04T03:52:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to transform data prior to indexing</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106424#M3967</link>
      <description>&lt;P&gt;ok,&lt;/P&gt;

&lt;P&gt;Thanks you to all involved on this question. I think the answer is clear.&lt;/P&gt;

&lt;P&gt;dwaddle hit the nail on the head.  Splunk doesn't really see a number, it just sees a piece of information that takes up a a number of characters.  Regex is perfect for finding patterns in strings, but it doesn't do math for you.&lt;/P&gt;

&lt;P&gt;This is fundamentally a math problem.  There doesn't seem to be a way to perform math on incoming data, one day there might be, but for this particular problem Splunk isn't actually the right tool for the job anyway.&lt;/P&gt;

&lt;P&gt;I will stop collecting this perfrmon data in Splunk and collect it in Nagios. This way I can use more of our Splunk license on ingesting application logs that will give us the best value.  We can still correlate application events against high CPU, high network IO etc we just won't be able to do it within the same tool!&lt;/P&gt;

&lt;P&gt;Cheers!&lt;/P&gt;

&lt;P&gt;C.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Apr 2012 04:03:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106424#M3967</guid>
      <dc:creator>Conradj</dc:creator>
      <dc:date>2012-04-04T04:03:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to transform data prior to indexing</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106425#M3968</link>
      <description>&lt;P&gt;ok, well, you dont need math to achieve this space saving endeavor. using scientifc notation simply modifies the units via # notation. and, you dont save byte space simply by converting to scientific notation, you only save byte space if you roundoff/truncate. you can SEDCMD the input data and normalize it say to E+3 or E+6 (whatever, etc) then round/truncate. then perhaps do a custom field extraction and name it with units attached, eg "kB" or "MB" or whatever. you can roundoff/truncate/normalize with SEDCMD, etc.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Apr 2012 18:26:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-to-transform-data-prior-to-indexing/m-p/106425#M3968</guid>
      <dc:creator>cvajs</dc:creator>
      <dc:date>2012-04-04T18:26:03Z</dc:date>
    </item>
  </channel>
</rss>

