Getting Data In

Shellscript monitoring / too_small sourcetype problem

flo_cognosec
Communicator

Hi

I would like to use Splunk to index logfiles of different kinds and to provide proper file change monitoring using the fschange stanzas.

Now as soon as I try to monitor small shellscripts of various names (around several hundreds) I run into the problem of Splunk flagging me all the scripts as too_small_.

Most of the scripts start with #!/bin/bash or #!/bin/sh and those should all be flagged as "shell_scripts"

I tried creating a rule in props.conf to filter our those files and then set a source_type but this does not seem to work.

I am using a universal forwarder right now.

Any ideas how I could get this work or where the problem might be ?

Can I selectively disable the too_small source_type ?

Thanks a lot in advance

0 Karma

flo_cognosec
Communicator

Hi People

I solved the problem and if you don't overlook the fact that you have to use a heavy forwarder / server it's easy.

Need to do some more tests regarding performance and the like but it looks at least promising 🙂

props.conf

[shellscript]
BREAK_ONLY_BEFORE=^#!\/bin\/(bash|sh)
BREAK_ONLY_BEFORE_DATE=false
LEARN_MODEL = false
MAX_EVENTS=200000

[rule::find_shellscript]
MORE_THAN_0 = ^#!\/bin\/(bash|sh)
sourcetype=shellscript

One little question remains, does a LESS_THAN_x implicate that is has to occur at least once ?

Greets

Flo

0 Karma

flo_cognosec
Communicator

It seems because of a lack of recursive sourcetype matching this is not really solvable by now 😕

Solved it by setting the sourcetype on the forwarder and doing some processing on the indexer.

Recursive matching of at least 1 level seems like a needed feature.

0 Karma

flo_cognosec
Communicator

Update 🙂

I now tried to go on as shown here but this does not seem to fix my problem 😕

The file genRootCA.sh is a small shell script, still it does not get flagged as shellscript but as -too-small. See debug output, how can I override the -too-small sourcetype recognition per event ?

Where is my mistake ?

props.conf

[shellscript]

BREAK_ONLY_BEFORE_DATE=false

TRANSFORMS-shell_script=shell_script_transform

transforms.conf

[shell_script_transform]

REGEX = ^#!\/bin\/(bash|sh)

LOOKAHEAD = 16

DEST_KEY = MetaData:Sourcetype

FORMAT = sourcetype::shellscript

12-06-2011 18:33:26.693 INFO  FSChangeMonitor - Generating notifications on /testing2
12-06-2011 18:33:26.693 DEBUG FSChangeMonitor - blacklist entered; path=/testing2/genRootCA.sh
12-06-2011 18:33:26.693 DEBUG FSChangeMonitor - no blacklist matches found
12-06-2011 18:33:26.697 DEBUG FSChangeManagerProcessor - NOTIFICATION (ADD)=/testing2/genRootCA.sh
12-06-2011 18:33:26.697 DEBUG PropertiesMapConfig - Performing pattern matching for: source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification|
12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Pattern 'fs_notification' matches with priority 100
12-06-2011 18:33:26.698 DEBUG FileClassifierManager - Finding type for file: /testing2/genRootCA.sh
12-06-2011 18:33:26.698 INFO  UTF8Processor - Converting using CHARSET="UTF-8" for conf "source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification|"
12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/testing2/genRootCA.sh
12-06-2011 18:33:26.698 INFO  LineBreakingProcessor - Using truncation length 10000 for conf "source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification|"
12-06-2011 18:33:26.698 INFO  LineBreakingProcessor - Using lookbehind 100 for conf "source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification|"
12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/testing2/genRootCA.sh
12-06-2011 18:33:26.698 INFO  AggregatorMiningProcessor - Setting up line merging apparatus for: source::fschangemonitor|host::flos-MacBook-Pro.local|fs_notification|
12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/testing2/genRootCA.sh
12-06-2011 18:33:26.698 DEBUG FileClassifierManager - filename="/testing2/genRootCA.sh" invalidCharCount="0" TotalCharCount="2367" PercentInvalid="0.000000"
12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Pattern 'genRootCA-too_small' matches with priority 100
12-06-2011 18:33:26.698 INFO  HotDBManager - no hot found for event ts=1323192806, closest match=null [expanded span=0]
12-06-2011 18:33:26.698 DEBUG FSChangeManagerProcessor - CLASSIFIED /testing2/genRootCA.sh as genRootCA-too_small
12-06-2011 18:33:26.698 DEBUG HotDBManager - dir does not exist, creating: /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23
12-06-2011 18:33:26.698 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|
12-06-2011 18:33:26.698 INFO  databasePartitionPolicy - creating new DB /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23
12-06-2011 18:33:26.698 INFO  timeinvertedIndex - Opening /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23
12-06-2011 18:33:26.699 INFO  timeinvertedIndex - No files to decompress on create
12-06-2011 18:33:26.699 DEBUG PropertiesMapConfig - Pattern 'genRootCA-too_small' matches with priority 100
12-06-2011 18:33:26.699 INFO  timeinvertedIndex - create by dirname /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23
12-06-2011 18:33:26.699 INFO  UTF8Processor - Converting using CHARSET="UTF-8" for conf "source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|"
12-06-2011 18:33:26.699 INFO  FSChangeMonitor - Finished generating notifications on /testing2 addCount=1 updateCount=0 deleteCount=0
12-06-2011 18:33:26.699 DEBUG databasePartitionPolicy - opening datafile for newly created TEDB: /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23
12-06-2011 18:33:26.699 INFO  LineBreakingProcessor - Using truncation length 10000 for conf "source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|"
12-06-2011 18:33:26.699 INFO  LineBreakingProcessor - Using lookbehind 100 for conf "source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|"
12-06-2011 18:33:26.699 INFO  databasePartitionPolicy - lazy loading database for: /Applications/splunk/var/lib/splunk/testing/db/hot_v1_23, id=23, ts=1323192806 dirMgr::nextId=23]
12-06-2011 18:33:26.699 INFO  HotDBManager - creating new hot (id=23, time=1323192806)]
12-06-2011 18:33:26.699 INFO  AggregatorMiningProcessor - Setting up line merging apparatus for: source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|
12-06-2011 18:33:26.699 DEBUG UTF8Processor - Done key received for: source::/testing2/genRootCA.sh|host::flos-MacBook-Pro.local|genRootCA-too_small|

0 Karma

lguinn2
Legend

Or you could say [too_small*] which would apply the transformation only to those inputs - this is probably closer to what you want.

0 Karma

lguinn2
Legend

Your stanza [shellscript] says "only apply the following transform to inputs that are ALREADY assigned the sourcetype of shellscript"

If you said [*] then it would apply the transformation to inputs from ALL sourcetypes, on a line-by-line basis.

0 Karma

flo_cognosec
Communicator

it seems that I cannot get splunk to assign a source-type based on actual file characteristics, something it seems to be able to do (see the too_small) source-type assignment.

Why doesn't it ever pick up my [shellscript] source-type , where is the error in my thinking ?

0 Karma

flo_cognosec
Communicator

Hi

Thanks for all the suggestions , but as I need to do full scale file integrity monitoring with splunk, I just need to check complete Linux systems.

Unfortunately, there are a lot of files in different directories and not all of the have a useful ending , let alone all the applications that I need to monitor.

I used to do that with some other tool, but am not "limited" to splunk and this is why I came up with that idea of checking the first line of each file with a regex and then assign the source-type

Might this work with some kind of transformation rule ?

0 Karma

flo_cognosec
Communicator

Hi

Thanks for all your answers.

I thought about being able to determine the source-type based on the first line of a file as I neither have a known file extension nor a fixed / known directory where the shell scripts exist.

Actually I could find all the directories but that would cost a significant amount of time and end in a very inefficient and hard-to-maintain config.

0 Karma

Drainy
Champion

But surely you do know where the directories are to define your fschange/monitor stanza in the first place?
What lisa is suggesting is to just define the sourcetype of all files within those directories of a certain file extension. Even if there are a range of them surely they adhere to some form of best practices, e.g. extension and location?

0 Karma

lguinn2
Legend

So the scripts aren't named with an extension of .sh? You could also put a series of extensions in the spec - such as (.sh|.bat|.py|.bsh) or whatever... Or, perhaps the scripts reside under a particular directory name?
Ultimately, you have to have some way of identifying the file:

  • by a pattern in the file name or the directory path
  • by the host that it comes from
  • by sourcetype

As an alternative, you could rename all sourcetypes that start with "too-small", but you run a significant risk of mislabeling an actual log file that is small. Here's how in props.conf:

[(?:::){0}too_small*]
sourcetype=shell_script

Again, this will apply only to new data. Earlier comments about re-indexing/renaming sourcetypes still apply.

flo_cognosec
Communicator

Yes, I knew that but it does not solve my problem.
How to flag an event with a certain sourcetype if the source (file) name is unknown
and only the first line is ?
All the events get flagged as too_small as the shell scripts tend to be rather small.

0 Karma

kristian_kolb
Ultra Champion

And also don't forget that you cannot use [monitor] and [fschange] on the same set of files/directories. Check out the information in the documentation for inputs.conf

http://docs.splunk.com/Documentation/Splunk/latest/Admin/inputsconf

/k

lguinn2
Legend

First, I am a bit confused. With fschange, you usually monitor directories for changes. Splunk creates an event whenever a file in the directory is changed, added or deleted. The sourcetype of these events is set to fs_notification by default; it should not show up as "too_small...". The stanza for setting up fschange monitoring looks like this

[fschange:/absolute/path/to/my/directory]

You can't do this via the user interface. You have to add the fschange stanza to inputs.conf manually, on your forwarders.

You can have Splunk index the contents of the shell scripts, though I don't think that is very useful. If you do that, you could get the "too_small..." sourcetype. And you could fix it by putting this into props.conf:

[source::/.../*.sh]
sourcetype=shell_script

Which says "if the file name ends in .sh, set the sourcetype to 'shell_script'"
This would go into a props.conf file on your forwarders. If you have a lot of forwarders, you might consider using Splunk's Deployment Server to distribute the config files (inputs.conf, props.conf, etc.). (The Deployment Server is part of Splunk.)

However, you can only change the sourcetype of new events. You can't change events that have already been indexed. You have a few choices for existing events:

  1. Clean the indexes (on the Splunk indexers). Reset the "fishbucket" on the Universal Forwarders. This will cause ALL of your data to be re-indexed. Fine if you are pre-production or testing, but probably not acceptable if you are working with a production Splunk environment.

  2. Use Sourcetype Renaming. (Find it under the Manager -> Fields ->Sourcetype renaming.) This will logically rename the sourcetype, although it doesn't change the actual data in the index.

I hope this helps!

Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.