Hello,
I have 4 log files on one Host that I want to index/ingest.
Log #1, #2, #3 will be ingested 24 hours a day, but log file #4 shares a batch process in the evening that has 20 - 30GB per evening of events that is not needed, nor do we want to pay for - because I wouldn't use them at this point in time.
I want to avoid stopping the Splunk Universal Forwarder Windows service from 6pm to 6am because that would mean that logs #1, #2, & #3 will not index. Also, I believe it would pool up in the fish bucket anyway, so that will null my effort to exclude indexing from 6pm to 6am for log #4.
Any ideas how I can avoid indexing log #4 from 6pm to 6am (night time batch window)?
Thanks!
Hi agoktas,
there are multiple ways of achieve this:
inputs.conf
http://docs.splunk.com/Documentation/Splunk/6.3.1511/admin/Inputsconf#inputs.conf.example see option blacklist =
inputs.conf
Update:
Following up all comments, this was the final working config:
Here is the final config that looks to be working great (we forgot '00' for the midnight hour):
Indexer configuration:
props.conf
[AppInternal]
TRANSFORMS-null= Appsetnull
transforms.conf
#Discard all events between 6pm - 6am
[Appsetnull]
REGEX = (?:d+/d+/d+|d+-d+-d+)s(18|19|20|21|22|23|00|01|02|03|04|05):
DEST_KEY = queue
FORMAT = nullQueue
Hope this helps ...
cheers, MuS
Hi agoktas,
there are multiple ways of achieve this:
inputs.conf
http://docs.splunk.com/Documentation/Splunk/6.3.1511/admin/Inputsconf#inputs.conf.example see option blacklist =
inputs.conf
Update:
Following up all comments, this was the final working config:
Here is the final config that looks to be working great (we forgot '00' for the midnight hour):
Indexer configuration:
props.conf
[AppInternal]
TRANSFORMS-null= Appsetnull
transforms.conf
#Discard all events between 6pm - 6am
[Appsetnull]
REGEX = (?:d+/d+/d+|d+-d+-d+)s(18|19|20|21|22|23|00|01|02|03|04|05):
DEST_KEY = queue
FORMAT = nullQueue
Hope this helps ...
cheers, MuS
What would the Regex look like to discard events on just Saturday from 12AM to 4AM?
Thanks.
Hi MuS,
From my understanding, you can only blacklist files or a regex value for a source's/file's content.
But I don't see anything where you can configure blacklist time frames.
Am I just not seeing the documentation pertaining to this?
Have you set something like this up before?
Thanks!
Sorry my bad, go for the nullQueue
filtering solution from the docs http://docs.splunk.com/Documentation/Splunk/6.3.1511/Forwarding/Routeandfilterdatad#Filter_event_dat... Hopefully you will have some unique identifier for the un-needed events. Do this on the indexer and re-start splunk.
otherwise the only solution time wise, would be an external cron job that stops the universal forwarder, checks this log 4 for the end of the batch process and echo "" > log.4
and restarts the universal forwarder again.....
I believe I'm on the last piece of this puzzle...
I have 2 servers involved, and only one of them needs to have the events for a particular set of logs sent to the nullQueue during the 6pm - 6am time window.
So that means I need to know how to specify the particular hostname + the log name in the example provided in http://docs.splunk.com/Documentation/Splunk/6.3.1511/Forwarding/Routeandfilterdatad#Filter_event_dat....
Because in the example of the link above, it only is specifying the log name/path as the source. How do I add the host as well?
Any ideas how I do this? Can you provide an example?
How about adding the host name to the regex? Because the props.conf stanza can be either source, sourcetype or host .....
In essence, I would need to be both: host & source instead of just one.
When broth criteria are met, then this would apply for these events.
Here is what I was thinking for props.conf:
[host::HOSTA]
[source::(?i)systemout.log.log$|systemoutServerA.log$]
TRANSFORMS-null= Applicationsetnull
But I'm not sure if that will work. Perhaps both 's need to be on the same line? Comma delimited? Even if possible?
Thoughts?
Thanks.
easiest way to achieve that, would be to assign a different sourcetype to the log that needs to be excluded or as stated before use the host in the regex like in this example http://docs.splunk.com/Documentation/Splunk/5.0/Data/Advancedsourcetypeoverrides#Example:_Assign_a_s...
You da man MuS!
Here is the final config that looks to be working great (we forgot '00' for the midnight hour):
Indexer configuration:
Props.conf:
[AppInternal]
TRANSFORMS-null= Appsetnull
Transforms.conf:
#Discard all events between 6pm - 6am
[Appsetnull]
REGEX = (?:\d+\/\d+\/\d+|\d+-\d+-\d+)\s(18|19|20|21|22|23|00|01|02|03|04|05):
DEST_KEY = queue
FORMAT = nullQueue
Thanks again for all your help! 🙂
You're welcome 🙂 I've updated the answer so please accept it - thx ! And don't forget to up-vote @rich7177 he started this with his idea to filter based on time 😉 !!
In this case, it will need to be host & source combined because I have 2 servers.
Host A will need to restrict 6pm - 6am for logging with Log A (systemout.log) & Log B (systemoutServerA.log).
Host B doesn't need the time restriction for Log A (systemout.log) & Log B (systemoutServerB.log).
If both servers had the same restriction needs, I would be home free. But since one is able to ingest 24hours a day, then it throws a wrench in the works.
Thanks.
That sounds perfect.
In fact, I'm now remembering a Splunk sales engineer mentioning this a while back for a similar situation. 🙂
I'll give this a shot. This should work just fine.
By any chance, would you happen to know the regex value for greater than 6pm & less than 6am?
Thanks!
Can you provide some examples of the events containing the time?
Absolutely.
There are actually 2 files that I will be dealing with (log #4 & log #5). Here are examples of each:
Log #4 example:
12,User:R_getStuff:1234567:id,com.company.demographics.app.inside.pf.Addid,user,OK,2015/12/09 11:42:48:477,2015/12/09 11:42:48:477,0
Log $5 example:
2015-12-09 11:43:10,801 DEBUG - _standard | Entering Summary2.inc | User: blah| Koid:CHOOSE_ACCOUNT:1234567:blah blah
The positioning of the date/time stamp are in different spots, but that shouldn't be a problem.
The date/time is formatting different between the two, but that shouldn't matter because I'm only looking at the time hour & minute - which is formatted the same of course. 🙂 So only 1 regex value needed for both stanzas that applies for only hour & minute.
Thanks!
Based on the examples and assuming you will have 24 hours in the logs(?) try this regex:
(?:\d+\/\d+\/\d+|\d+-\d+-\d+)\s(07|08|09|10|11|12|13|14|15|16|18):
The first group will match both possible date formats and the second group will macht any hour from 07 til 18 ..... Does that makes sense?
Since this value would send it to the nullQueue, I'm guessing we would do this instead (batch window):
(?:\d+\/\d+\/\d+|\d+-\d+-\d+)\s(19|20|21|22|23|01|02|03|04|05|06)
So the stanza would look like:
[setnull]
REGEX = (?:\d+\/\d+\/\d+|\d+-\d+-\d+)\s(19|20|21|22|23|01|02|03|04|05|06)
DEST_KEY = queue
FORMAT = nullQueue
Does that look right?
HeHe, I keep messing up things here (the time range this time) 🙂
That's the transforms.conf
and it looks good, don't forget the props.conf
to match it to source
and place it on either a heavy weight forwarder or an indexer and restart Splunk after the change.
By the way, does the end of that regex value need to have a colon? I noticed you had it in your first example.
Please verify this is correct?
(?:\d+\/\d+\/\d+|\d+-\d+-\d+)\s(19|20|21|22|23|01|02|03|04|05|06):
This is just to make sure it does match the timestamp, it will be the :
between the hour and the minute. If you are 1000% sure there are no other events containing something like 1234-12-12 20 foo
or 1234/12/12 20 foo
you won't need it....otherwise add it.
Cool. I'll give this a shot.
We have 1 indexer/search head and I'll configure both the props.conf & transforms.conf there.
I'll restart the indexer and see how things go. I'll probably be doing this tomorrow and will update this thread on how it turns out.
Thanks so much for your help! 🙂