I would like to use Splunk to index logfiles of different kinds and to provide proper file change monitoring using the fschange stanzas.
Now as soon as I try to monitor small shellscripts of various names (around several hundreds) I run into the problem of Splunk flagging me all the scripts as too_small_
Most of the scripts start with #!/bin/bash or #!/bin/sh and those should all be flagged as "shell_scripts"
I tried creating a rule in props.conf to filter our those files and then set a source_type but this does not seem to work.
I am using a universal forwarder right now.
Any ideas how I could get this work or where the problem might be ?
Can I selectively disable the too_small source_type ?
Thanks a lot in advance
I solved the problem and if you don't overlook the fact that you have to use a heavy forwarder / server it's easy.
Need to do some more tests regarding performance and the like but it looks at least promising 🙂
LEARN_MODEL = false
MORE_THAN_0 = ^#!\/bin\/(bash|sh)
One little question remains, does a LESS_THAN_x implicate that is has to occur at least once ?
It seems because of a lack of recursive sourcetype matching this is not really solvable by now 😕
Solved it by setting the sourcetype on the forwarder and doing some processing on the indexer.
Recursive matching of at least 1 level seems like a needed feature.
I now tried to go on as shown here but this does not seem to fix my problem 😕
The file genRootCA.sh is a small shell script, still it does not get flagged as shellscript but as -too-small. See debug output, how can I override the -too-small sourcetype recognition per event ?
Where is my mistake ?
REGEX = ^#!\/bin\/(bash|sh)
LOOKAHEAD = 16
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::shellscript
12-06-2011 18:33:26.693 INFO FSChangeMonitor - Generating notifications on /testing2 12-06-2011 18:33:26.693 DEBUG FSChangeMonitor - blacklist entered; path=/testing2/genRootCA.sh 12-06-2011 18:33:26.693 DEBUG FSChangeMonitor - no blacklist matches found 12-06-2011 18:33:26.697 DEBUG FSChangeManagerProcessor - NOTIFICATION (ADD)=/testing2/genRootCA.sh 12-06-2011 18:33:26.697 DEBUG PropertiesMapConfig - Performing pattern matching for: source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification| 12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Pattern 'fs_notification' matches with priority 100 12-06-2011 18:33:26.698 DEBUG FileClassifierManager - Finding type for file: /testing2/genRootCA.sh 12-06-2011 18:33:26.698 INFO UTF8Processor - Converting using CHARSET="UTF-8" for conf "source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification|" 12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/testing2/genRootCA.sh 12-06-2011 18:33:26.698 INFO LineBreakingProcessor - Using truncation length 10000 for conf "source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification|" 12-06-2011 18:33:26.698 INFO LineBreakingProcessor - Using lookbehind 100 for conf "source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification|" 12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/testing2/genRootCA.sh 12-06-2011 18:33:26.698 INFO AggregatorMiningProcessor - Setting up line merging apparatus for: source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification| 12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/testing2/genRootCA.sh 12-06-2011 18:33:26.698 DEBUG FileClassifierManager - filename="/testing2/genRootCA.sh" invalidCharCount="0" TotalCharCount="2367" PercentInvalid="0.000000" 12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Pattern 'genRootCA-too_small' matches with priority 100 12-06-2011 18:33:26.698 INFO HotDBManager - no hot found for event ts=1323192806, closest match=null [expanded span=0] 12-06-2011 18:33:26.698 DEBUG FSChangeManagerProcessor - CLASSIFIED /testing2/genRootCA.sh as genRootCA-too_small 12-06-2011 18:33:26.698 DEBUG HotDBManager - dir does not exist, creating: /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23 12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small| 12-06-2011 18:33:26.698 INFO databasePartitionPolicy - creating new DB /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23 12-06-2011 18:33:26.698 INFO timeinvertedIndex - Opening /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23 12-06-2011 18:33:26.699 INFO timeinvertedIndex - No files to decompress on create 12-06-2011 18:33:26.699 DEBUG PropertiesMapConfig - Pattern 'genRootCA-too_small' matches with priority 100 12-06-2011 18:33:26.699 INFO timeinvertedIndex - create by dirname /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23 12-06-2011 18:33:26.699 INFO UTF8Processor - Converting using CHARSET="UTF-8" for conf "source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|" 12-06-2011 18:33:26.699 INFO FSChangeMonitor - Finished generating notifications on /testing2 addCount=1 updateCount=0 deleteCount=0 12-06-2011 18:33:26.699 DEBUG databasePartitionPolicy - opening datafile for newly created TEDB: /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23 12-06-2011 18:33:26.699 INFO LineBreakingProcessor - Using truncation length 10000 for conf "source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|" 12-06-2011 18:33:26.699 INFO LineBreakingProcessor - Using lookbehind 100 for conf "source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|" 12-06-2011 18:33:26.699 INFO databasePartitionPolicy - lazy loading database for: /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23, id=23, ts=1323192806 dirMgr::nextId=23] 12-06-2011 18:33:26.699 INFO HotDBManager - creating new hot (id=23, time=1323192806)] 12-06-2011 18:33:26.699 INFO AggregatorMiningProcessor - Setting up line merging apparatus for: source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small| 12-06-2011 18:33:26.699 DEBUG UTF8Processor - Done key received for: source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|
Your stanza [shellscript] says "only apply the following transform to inputs that are ALREADY assigned the sourcetype of shellscript"
If you said [*] then it would apply the transformation to inputs from ALL sourcetypes, on a line-by-line basis.
it seems that I cannot get splunk to assign a source-type based on actual file characteristics, something it seems to be able to do (see the too_small) source-type assignment.
Why doesn't it ever pick up my [shellscript] source-type , where is the error in my thinking ?
Thanks for all the suggestions , but as I need to do full scale file integrity monitoring with splunk, I just need to check complete Linux systems.
Unfortunately, there are a lot of files in different directories and not all of the have a useful ending , let alone all the applications that I need to monitor.
I used to do that with some other tool, but am not "limited" to splunk and this is why I came up with that idea of checking the first line of each file with a regex and then assign the source-type
Might this work with some kind of transformation rule ?
Thanks for all your answers.
I thought about being able to determine the source-type based on the first line of a file as I neither have a known file extension nor a fixed / known directory where the shell scripts exist.
Actually I could find all the directories but that would cost a significant amount of time and end in a very inefficient and hard-to-maintain config.
But surely you do know where the directories are to define your fschange/monitor stanza in the first place?
What lisa is suggesting is to just define the sourcetype of all files within those directories of a certain file extension. Even if there are a range of them surely they adhere to some form of best practices, e.g. extension and location?
So the scripts aren't named with an extension of .sh? You could also put a series of extensions in the spec - such as (.sh|.bat|.py|.bsh) or whatever... Or, perhaps the scripts reside under a particular directory name?
Ultimately, you have to have some way of identifying the file:
As an alternative, you could rename all sourcetypes that start with "too-small", but you run a significant risk of mislabeling an actual log file that is small. Here's how in props.conf:
Again, this will apply only to new data. Earlier comments about re-indexing/renaming sourcetypes still apply.
Yes, I knew that but it does not solve my problem.
How to flag an event with a certain sourcetype if the source (file) name is unknown
and only the first line is ?
All the events get flagged as too_small as the shell scripts tend to be rather small.
And also don't forget that you cannot use
[fschange] on the same set of files/directories. Check out the information in the documentation for inputs.conf
First, I am a bit confused. With fschange, you usually monitor directories for changes. Splunk creates an event whenever a file in the directory is changed, added or deleted. The sourcetype of these events is set to fs_notification by default; it should not show up as "too_small...". The stanza for setting up fschange monitoring looks like this
You can't do this via the user interface. You have to add the fschange stanza to inputs.conf manually, on your forwarders.
You can have Splunk index the contents of the shell scripts, though I don't think that is very useful. If you do that, you could get the "too_small..." sourcetype. And you could fix it by putting this into props.conf:
Which says "if the file name ends in .sh, set the sourcetype to 'shell_script'"
This would go into a props.conf file on your forwarders. If you have a lot of forwarders, you might consider using Splunk's Deployment Server to distribute the config files (inputs.conf, props.conf, etc.). (The Deployment Server is part of Splunk.)
However, you can only change the sourcetype of new events. You can't change events that have already been indexed. You have a few choices for existing events:
Clean the indexes (on the Splunk indexers). Reset the "fishbucket" on the Universal Forwarders. This will cause ALL of your data to be re-indexed. Fine if you are pre-production or testing, but probably not acceptable if you are working with a production Splunk environment.
Use Sourcetype Renaming. (Find it under the Manager -> Fields ->Sourcetype renaming.) This will logically rename the sourcetype, although it doesn't change the actual data in the index.
I hope this helps!