<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: warning and violations, how to reduce my indexed volume ? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61481#M12258</link>
    <description>&lt;P&gt;Details are on this wiki page : &lt;A href="http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume"&gt;http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;remark : &lt;BR /&gt;
License_usage.log is available in the Splunk license master instance only. A license master logs indexed events volume every minute by the information the slaves send to the master. A slave maintains a table of how much you've indexed on a slave in chunks of time. Typically that chunk of time is 1 minute, but the chunk may grow if the slave cannot contact the master -- Splunk only resets the chunk when the table is sent to the master. The table is of src,srctype,host tuples… &lt;STRONG&gt;if that table grows to exceed 1000 entries, then Splunk squashes the host/source keys.0&lt;/STRONG&gt; So, if you have more than 1000 different tuple entries, you find no value for h(ost) and s(ource) fields. Splunk never suppresses st(sourcetype) in the log. &lt;/P&gt;</description>
    <pubDate>Wed, 11 Apr 2012 17:16:30 GMT</pubDate>
    <dc:creator>yannK</dc:creator>
    <dc:date>2012-04-11T17:16:30Z</dc:date>
    <item>
      <title>warning and violations, how to reduce my indexed volume ?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61478#M12255</link>
      <description>&lt;P&gt;Hi &lt;BR /&gt;
I have a  license pool for X Gb per day, and I blow it every almost every single day.&lt;BR /&gt;
How to selectively reduce my indexing volume ?&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2012 20:02:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61478#M12255</guid>
      <dc:creator>mataharry</dc:creator>
      <dc:date>2012-01-31T20:02:41Z</dc:date>
    </item>
    <item>
      <title>Re: warning and violations, how to reduce my indexed volume ?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61479#M12256</link>
      <description>&lt;P&gt;Enable the Splunk Deployment Monitor app and see which host/source is sending the most data - decide on the value of that data and then disable it.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2012 20:07:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61479#M12256</guid>
      <dc:creator>rjyetter</dc:creator>
      <dc:date>2012-01-31T20:07:40Z</dc:date>
    </item>
    <item>
      <title>Re: warning and violations, how to reduce my indexed volume ?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61480#M12257</link>
      <description>&lt;P&gt;Hi Mata&lt;/P&gt;

&lt;P&gt;The options are simple :&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;reduce the indexed volume.&lt;/LI&gt;
&lt;LI&gt;or get a license volume upgrade  (contact splunk sales)&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;For the first option here are the steps : &lt;/P&gt;

&lt;P&gt;1 - &lt;STRONG&gt;Analyze you data&lt;/STRONG&gt;, to identify where the volume it is coming from.&lt;BR /&gt;
in 4.2+ you can use those searches &lt;STRONG&gt;on the license-master&lt;/STRONG&gt;&lt;BR /&gt;
see &lt;A href="http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume"&gt;http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume&lt;/A&gt;&lt;BR /&gt;
if you prefer detail, you can add details on the source "s", host "h", sourcetype "st", indexer "i".&lt;/P&gt;

&lt;P&gt;total per pool &lt;CODE&gt;index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by pool&lt;/CODE&gt; &lt;/P&gt;

&lt;P&gt;detail per sourcetype&lt;BR /&gt;
&lt;CODE&gt;index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by st useother=false&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;detail per source&lt;BR /&gt;
&lt;CODE&gt;index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by s useother=false&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;detail per host&lt;BR /&gt;
&lt;CODE&gt;index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by h useother=false&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;2- If some forwarders are not necessary, turn splunk forwarder off on those boxes.&lt;BR /&gt;
Why did you deployed a forwarder on every single box in the first place !!!&lt;/P&gt;

&lt;P&gt;3- If some useless files are being indexed, &lt;STRONG&gt;be more selective.&lt;/STRONG&gt;&lt;BR /&gt;
Disable the inputs, or use whitelist/blacklists to limit the scope&lt;BR /&gt;
example to drop the core files, or to index only &lt;EM&gt;.log files:&lt;BR /&gt;
`[montitor:///var/log]&lt;BR /&gt;
blacklist=.core$&lt;BR /&gt;
[monitor:///mypath/&lt;/EM&gt;.log]&lt;BR /&gt;
`&lt;/P&gt;

&lt;P&gt;4 - If some servers are sending to much data (syslog by example)&lt;BR /&gt;
&lt;STRONG&gt;disable the routing to splunk&lt;/STRONG&gt;, or select the components to send.&lt;BR /&gt;
example on syslog.conf (send only critical and errors, and every event from my application)&lt;BR /&gt;
&lt;CODE&gt;&lt;BR /&gt;
*.CRITICAL      splunk.mydomain.com&lt;BR /&gt;
*.ERROR         splunk.mydomain.com&lt;BR /&gt;
myapplication.* splunk.mydomain.com&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;5 - If some log files contains too much data, &lt;STRONG&gt;change the verbosity level of your applications&lt;/STRONG&gt; (by example, avoid the DEBUG mode)&lt;/P&gt;

&lt;P&gt;6- &lt;STRONG&gt;Search for duplicates events in the logs&lt;/STRONG&gt;, please check they exists in the original logs, or if the same log file is being indexed several times (some log rotation may cause that)&lt;BR /&gt;
here are searches to find duplicates in splunk :&lt;BR /&gt;
&lt;CODE&gt;* | eval raw=_raw | convert ctime(_indextime) as indextime &lt;BR /&gt;
| stats count  first(indextime) as first last(indextime) as last by raw   | where count &amp;gt; 1 | table count first last raw&lt;/CODE&gt;&lt;BR /&gt;
Then drilldown to the source to figure.&lt;/P&gt;

&lt;P&gt;7 - If your cannot disable an input but don't need all the events, you can setup a &lt;STRONG&gt;NULLQUEUE filtering of the events&lt;/STRONG&gt;.&lt;BR /&gt;
This has to be setup on the indexers (or heavy forwarders)&lt;BR /&gt;
(with windows eventlogs, we usually use filtering on the eventcode)&lt;/P&gt;

&lt;P&gt;see examples &lt;A href="http://docs.splunk.com/Documentation/Splunk/4.3/Deploy/Routeandfilterdatad#Discard_specific_events_and_keep_the_rest:"&gt;http://docs.splunk.com/Documentation/Splunk/4.3/Deploy/Routeandfilterdatad#Discard_specific_events_and_keep_the_rest:&lt;/A&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;Discard specific events and keep the rest&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;or Keep specific events and discard the rest &lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 31 Jan 2012 20:17:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61480#M12257</guid>
      <dc:creator>yannK</dc:creator>
      <dc:date>2012-01-31T20:17:13Z</dc:date>
    </item>
    <item>
      <title>Re: warning and violations, how to reduce my indexed volume ?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61481#M12258</link>
      <description>&lt;P&gt;Details are on this wiki page : &lt;A href="http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume"&gt;http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;remark : &lt;BR /&gt;
License_usage.log is available in the Splunk license master instance only. A license master logs indexed events volume every minute by the information the slaves send to the master. A slave maintains a table of how much you've indexed on a slave in chunks of time. Typically that chunk of time is 1 minute, but the chunk may grow if the slave cannot contact the master -- Splunk only resets the chunk when the table is sent to the master. The table is of src,srctype,host tuples… &lt;STRONG&gt;if that table grows to exceed 1000 entries, then Splunk squashes the host/source keys.0&lt;/STRONG&gt; So, if you have more than 1000 different tuple entries, you find no value for h(ost) and s(ource) fields. Splunk never suppresses st(sourcetype) in the log. &lt;/P&gt;</description>
      <pubDate>Wed, 11 Apr 2012 17:16:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/warning-and-violations-how-to-reduce-my-indexed-volume/m-p/61481#M12258</guid>
      <dc:creator>yannK</dc:creator>
      <dc:date>2012-04-11T17:16:30Z</dc:date>
    </item>
  </channel>
</rss>

