I'm trying to determine the best way to parse out data before it gets to my splunk indexer. It looks like a heavy forwarder will do that but I want to know what exactly I need to enable. I've installed splunk full - set it up as a forwarder, am not storing a local copy of data and then figured that I probably want to disable all of the indexes, since I don't want to index - I only want to forward, but splunk told me I could not disable the main index because it's the default DB.
Does that mean that splunk will index my data locally regardless, as well as forward it on to the main indexer?
What you want to do is configure an "app" for your logs that gets pushed to the HF. In the app you will need a transforms.conf and props.conf file that contains a regex to which it can parse the log data for forwarding. So, whatever your criteria is, you have to be able to state it as a regex.
Filter Data Example Usage: http://docs.splunk.com/Documentation/Splunk/5.0.4/Data/Anonymizedatausingconfigurationfiles#Through_...
Splunk 6 Filtering Doc:
Don't listen to the people about HF, it has lots of uses beyond "specialty", and this is one of the many examples of where it is useful, preventing log overrun. I would index locally on the HF and forward only what you want so that you have some buffer of "All" to look at in extreme emergencies, but it's your call.
Works for me. Is an Enterprise License required on the receiver in order to parse data before it gets indexed? We're trying to keep down the 500MB daily limit as we step in to this product. Thanks again!
i am in a similar situation. i have a heavy forward on which i've installed db connect app. however, i do not want to parse any events on this forwarder; instead i want to forward them to my indexers. do i need to simply push a forwarders conf file or outputs conf file to the heavy forward along with any inputs.conf or other necessary files?
or do i need to explicity turn off local indexing somewhere?
I am new to Splunk. I have a syslog server set up as a Receiver and another server set up as a Heavy Forwarder. I thought this was necessary in order to parse data before it is sent to the Receiver. Am I correct on this? I do not have an Enterprise license; I'm in the trial period. In Apps see an option to set up forwarding and an option to set up a Lightweight Forwarder. Should I be seeing something that says Set up a Heavy Forwarder? Am I not seeing it because I don't have an Enterprise License? Thanks!
there is almost no reason to use a heavy forwarder. all filtering and parsing that it does can be done more efficiently and more scalably on the indexers themselves. a heavy forwarder is considered a specialty configuration, and therefore not presented as a standard option.
posting as an answer because pasting is bad in a comment. Essentially I want to take the below input
2012-02-10 13:04:51,208 [http-0.0.0.0-8080-252] ERROR Rejecting request due 2012-02-10 13:04:51,348 [http-0.0.0.0-8080-29] ERROR [NOTIFY]: some error lots of stuff fdsfasdkfljdskfsadjd fsadjklfjsdaklfasdjf fsjadklfjsdalkfjdsaklfjsd 2012-02-10 13:05:06,895 [http-0.0.0.0-8080-298] ERROR [org.apache.commons.beanutils.PropertyUtils] Method invocation failed. java.lang.IllegalArgumentException: java.lang.ClassCastException@73afa6d1 at sun.reflect.GeneratedMethodAccessor275.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.commons.beanutils.PropertyUtilsBean.invokeMethod(PropertyUtilsBean.java:1773) 2012-02-10 13:04:45,873 [http-0.0.0.0-8080-151] ERROR [site.mismatch.DupDetection] Rejecting request due 2012-02-10 13:04:51,208 [http-0.0.0.0-8080-252] ERROR [site.mismatch.DupDetection] Rejecting request due
and I want to break it into the following events - for clarification I'll separate with ##########
2012-02-10 13:04:51,208 [http-0.0.0.0-8080-252] ERROR Rejecting request due ########## 2012-02-10 13:04:51,348 [http-0.0.0.0-8080-29] ERROR [NOTIFY]: some error lots of stuff fdsfasdkfljdskfsadjd fsadjklfjsdaklfasdjf fsjadklfjsdalkfjdsaklfjsd ########## 2012-02-10 13:05:06,895 [http-0.0.0.0-8080-298] ERROR [org.apache.commons.beanutils.PropertyUtils] Method invocation failed. java.lang.IllegalArgumentException: java.lang.ClassCastException@73afa6d1 ########## at sun.reflect.GeneratedMethodAccessor275.invoke(Unknown Source) ########## at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ########## at java.lang.reflect.Method.invoke(Method.java:597) ########## at org.apache.commons.beanutils.PropertyUtilsBean.invokeMethod(PropertyUtilsBean.java:1773) ########## 2012-02-10 13:04:45,873 [http-0.0.0.0-8080-151] ERROR [site.mismatch.DupDetection] Rejecting request due ##########
reason being is I then want to match any line starting with ^\s and throw it away (getting rid of the java stack trace), and it looks like you can only throw away whole events, not partial events. Technically the 4 "at" lines were part of the IllegalArgumentException that preceded it, but I don't want it because in real life, there's 80 more "at" lines that follow and they are not interesting to me.
the one thing I'm really missing is the ability to match on a string(^20), and say all lines until the next match belong to this, except for lines that start with a space - if space, then break into new events UNTIL next match of ^20, which is in itself a new event
I can't seem to get splunk to do something like this.
The reason I'm interested in a heavy forwarder is to reduce the noise that gets sent to the indexer. Not a big deal with one server, but if we deploy to dozens of servers and something goes wrong and the logs start spewing GB's of log entries when we're only interested in a couple lines in each java stack trace (what I'm monitoring) it'll be very easy to get into a license violation, as well as a lot of extra network traffic that I can reduce with a little bit of splunk and cpu muscle.
I was able to parse things great with logstash, but it's a bit of a hacked solution and not as elegant as splunk. I've yet been unable to get the splunk forwarder to parse the data the way I want so the hope is to get that working so I can use a entire splunk solution.
yep - I've been using nullQueue, which has solved half the problem, but there's still the issue of all my hosts sending all those logs over in full format to get chomped by the indexer where it would be more efficient if the individual nodes could do the trimming before sending over.
I agree with you infinitiguy. I've been hammering our Splunk rep over this, and feel they should enable the parsing of data locally with the Universal Forwarder - like Elasticsearch's "filebeat" shipper does (and filebeat consumes very little CPU/mem to do this for very large and active logs!) Let the client decide how much data they want to send over the network.
actually you can put the identical filtering rules on the indexer (sending items to the nullQueue), and they will be processed identically, and they do not get charged to license unless they are actually indexed. Network traffic makes a difference, but you pay for it with CPU on your source server. If you're only filtering rare or small amounts, you pay the cost all the time, but only save network bandwidth rarely. additionally, you start having to managing more variable config distributed over probably multiple source servers rather than centralized at your indexers.
First, I would encourage you to use a Light or Universal rather than Heavy Forwarder. You can do your parsing and filtering on the indexer. This tends to be a lot easier to manage and scale.
Enabling forwarding will by default disable local indexing for forwarded data. (You have to specifically
indexAndForward = true to have something that is forwarded indexed locally.) Note that a heavy forwarder will probably index internal logs, as I believe those are not forwarded by default, even if you enabled forwarding. So you'll be okay with your config.