<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Splunk in really BIG environment in Alerting</title>
    <link>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72264#M12222</link>
    <description>&lt;P&gt;quick answer :&lt;/P&gt;

&lt;P&gt;1 - Is there a possibility to minimize splunk's Index size?&lt;/P&gt;

&lt;P&gt;Not really, the indexes are as much compact as possible already, however you can improve a bit if you have recurrent patterns see &lt;A href="http://docs.splunk.com/Documentation/Splunk/4.2.3/Data/Improvedatacompressionwithsegmentation"&gt;http://docs.splunk.com/Documentation/Splunk/4.2.3/Data/Improvedatacompressionwithsegmentation&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;However, if your question is "what should be my disk strategy to store a large amount of data", you should look at :&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;filter data at index time&lt;/STRONG&gt; (in order to drop useless events before)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;data retention policy&lt;/STRONG&gt;, by storing your data in different indexes with different life cycle (you can specify per index a maximum size and maximum retention period) see &lt;A href="http://www.splunk.com/wiki/Deploy:BucketRotationAndRetention"&gt;http://www.splunk.com/wiki/Deploy:BucketRotationAndRetention&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;secondary storage (the homePath and coldPath for the buckets of each indexes can be located on separate file systems)&lt;/LI&gt;
&lt;LI&gt;cluster of indexers sharing the same license volume (more servers, more storage capacity, and better performances)&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;2 - Are there special options to index a bis amount of log files very fast? Or is splunk just as fast as the hardware it is given?&lt;/P&gt;

&lt;P&gt;You can improve your indexing speed with &lt;STRONG&gt;better hardware&lt;/STRONG&gt; (ie : no VM, faster cpu, mem, SSD drives for the hot buckets) or by &lt;STRONG&gt;clustering&lt;/STRONG&gt; (adding more indexers).&lt;BR /&gt;
And If one particular input is critical, you can forward them to dedicated indexers.&lt;BR /&gt;
At the end you can be searching over all your indexers (search-head + X*search-peers)&lt;/P&gt;

&lt;P&gt;3 - Is there a possibility to keep an eye on the network traffic so that there are no disruptions in real time operation? There will be lot of traffic to get all the data to the indexer. Can this be done by forwarders?&lt;/P&gt;

&lt;P&gt;FYI, the Universal Forwarders and Light Weight Forwarder have a a default limitation of 256KBps on the network traffic to keep a low profile, but this can be remove easily.&lt;/P&gt;

&lt;P&gt;You can rely on the &lt;STRONG&gt;Deployment-monitor app&lt;/STRONG&gt; to detect if a forwarder is not sending data because there is not data to send, or because something is wrong (down, blocked, queuing...)&lt;/P&gt;

&lt;P&gt;But the other approach is to setup monitoring/alerting on :&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;the incoming traffic on the indexers (metrics.log or license_usage.log )&lt;/LI&gt;
&lt;LI&gt;on the latency of the events (index time vs event timestamp)&lt;/LI&gt;
&lt;LI&gt;or event easier, on the events themselves
If you know that serverA sends a particular event B every minute, you can setup alerting on that event. The difficulty is to define what is an anomaly.&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Fri, 09 Sep 2011 16:49:53 GMT</pubDate>
    <dc:creator>yannK</dc:creator>
    <dc:date>2011-09-09T16:49:53Z</dc:date>
    <item>
      <title>Splunk in really BIG environment</title>
      <link>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72261#M12219</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I was told to evaluate Splunk to run in a really BIG company. We are talking about a big amount of log files each day. My boss asked me to clarify some of his concerns and I thought this board could be a starting point.&lt;/P&gt;

&lt;P&gt;I tested Splunk (Free) on my local computer and found out that 1GB of our log files result in 250MB of Index. Is this normal for Splunk? If this were true we would reach the capacity of every reasonable hdd setup very soon. So I am asking myself:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Is there a possibility to minimize splunk's Index size?&lt;/LI&gt;
&lt;LI&gt;Are there special options to index a bis amount of log files very fast? Or is splunk just as fast as the hardware it is given?&lt;/LI&gt;
&lt;LI&gt;Is there a possibility to keep an eye on the network traffic so that there are no disruptions in real time operation? There will be lot of traffic to get all the data to the indexer. Can this be done by forwarders? I didn't have the time to read all the docs yet.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;I think there are more questions to asked when running splunk "BIG" but until now this are the most important questions for us.&lt;/P&gt;

&lt;P&gt;Thank you in advance. Kind reagrds,&lt;BR /&gt;
Katsche&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2011 07:33:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72261#M12219</guid>
      <dc:creator>Katsche</dc:creator>
      <dc:date>2011-09-09T07:33:08Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk in really BIG environment</title>
      <link>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72262#M12220</link>
      <description>&lt;P&gt;I really think the best thing to do is to contact &lt;A href="mailto:sales@splunk.com"&gt;sales@splunk.com&lt;/A&gt; and book a meeting where you can discuss these questions more thoroughly. That said, there are good sections in the docs that you should read (like &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Installation/CapacityplanningforalargerSplunkdeployment"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Installation/CapacityplanningforalargerSplunkdeployment&lt;/A&gt; and &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Installation/HowHowmuchspaceyouwillneed"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Installation/HowHowmuchspaceyouwillneed&lt;/A&gt; ).&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2011 08:13:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72262#M12220</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2011-09-09T08:13:50Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk in really BIG environment</title>
      <link>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72263#M12221</link>
      <description>&lt;P&gt;I will check your links. Thank you very much. To schedule an appointment is a good idea, too.&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2011 09:10:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72263#M12221</guid>
      <dc:creator>Katsche</dc:creator>
      <dc:date>2011-09-09T09:10:37Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk in really BIG environment</title>
      <link>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72264#M12222</link>
      <description>&lt;P&gt;quick answer :&lt;/P&gt;

&lt;P&gt;1 - Is there a possibility to minimize splunk's Index size?&lt;/P&gt;

&lt;P&gt;Not really, the indexes are as much compact as possible already, however you can improve a bit if you have recurrent patterns see &lt;A href="http://docs.splunk.com/Documentation/Splunk/4.2.3/Data/Improvedatacompressionwithsegmentation"&gt;http://docs.splunk.com/Documentation/Splunk/4.2.3/Data/Improvedatacompressionwithsegmentation&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;However, if your question is "what should be my disk strategy to store a large amount of data", you should look at :&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;filter data at index time&lt;/STRONG&gt; (in order to drop useless events before)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;data retention policy&lt;/STRONG&gt;, by storing your data in different indexes with different life cycle (you can specify per index a maximum size and maximum retention period) see &lt;A href="http://www.splunk.com/wiki/Deploy:BucketRotationAndRetention"&gt;http://www.splunk.com/wiki/Deploy:BucketRotationAndRetention&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;secondary storage (the homePath and coldPath for the buckets of each indexes can be located on separate file systems)&lt;/LI&gt;
&lt;LI&gt;cluster of indexers sharing the same license volume (more servers, more storage capacity, and better performances)&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;2 - Are there special options to index a bis amount of log files very fast? Or is splunk just as fast as the hardware it is given?&lt;/P&gt;

&lt;P&gt;You can improve your indexing speed with &lt;STRONG&gt;better hardware&lt;/STRONG&gt; (ie : no VM, faster cpu, mem, SSD drives for the hot buckets) or by &lt;STRONG&gt;clustering&lt;/STRONG&gt; (adding more indexers).&lt;BR /&gt;
And If one particular input is critical, you can forward them to dedicated indexers.&lt;BR /&gt;
At the end you can be searching over all your indexers (search-head + X*search-peers)&lt;/P&gt;

&lt;P&gt;3 - Is there a possibility to keep an eye on the network traffic so that there are no disruptions in real time operation? There will be lot of traffic to get all the data to the indexer. Can this be done by forwarders?&lt;/P&gt;

&lt;P&gt;FYI, the Universal Forwarders and Light Weight Forwarder have a a default limitation of 256KBps on the network traffic to keep a low profile, but this can be remove easily.&lt;/P&gt;

&lt;P&gt;You can rely on the &lt;STRONG&gt;Deployment-monitor app&lt;/STRONG&gt; to detect if a forwarder is not sending data because there is not data to send, or because something is wrong (down, blocked, queuing...)&lt;/P&gt;

&lt;P&gt;But the other approach is to setup monitoring/alerting on :&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;the incoming traffic on the indexers (metrics.log or license_usage.log )&lt;/LI&gt;
&lt;LI&gt;on the latency of the events (index time vs event timestamp)&lt;/LI&gt;
&lt;LI&gt;or event easier, on the events themselves
If you know that serverA sends a particular event B every minute, you can setup alerting on that event. The difficulty is to define what is an anomaly.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 09 Sep 2011 16:49:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Alerting/Splunk-in-really-BIG-environment/m-p/72264#M12222</guid>
      <dc:creator>yannK</dc:creator>
      <dc:date>2011-09-09T16:49:53Z</dc:date>
    </item>
  </channel>
</rss>

