Getting Data In

Can I turn off the data is too_small sourcetype behavior?

marksedam
New Member

I have a set of log files that when they contain greater than 99 events have rules defined in the props.conf to properly apply sourcetypes. Yet when the logs contain 99 or fewer events the sourcetype gets a "[filename]-too_small" sourcetype assigned to it. When the files increase in size to 100 or greater they still have the incorrect sourcetype applied.

Is there anyway to stop this default action other than "pad" the logs with dummy events to number at least 100? Basically I would like Splunk to consult with the rule stanzas in the props.conf file before resorting to the default action on small files.

Thanks

0 Karma
1 Solution

beatus
Communicator

marksedam,
There's no way to turn off the too_small behavior it seems, so we can deal with it at index time then. This won't be the cheapest possible way (in terms of CPU) to do so, but it should work for you.

props.conf:

[(?::){0}*-too_small]
TRANSFORMS-remove_too_small = remove_too_small

transforms.conf:

[remove_too_small]
SOURCE_KEY = MetaData:Sourcetype
DEST_KEY = MetaData:Sourcetype
REGEX = sourcetype::(.*)-too_small
FORMAT = sourcetype::$1

View solution in original post

beatus
Communicator

marksedam,
There's no way to turn off the too_small behavior it seems, so we can deal with it at index time then. This won't be the cheapest possible way (in terms of CPU) to do so, but it should work for you.

props.conf:

[(?::){0}*-too_small]
TRANSFORMS-remove_too_small = remove_too_small

transforms.conf:

[remove_too_small]
SOURCE_KEY = MetaData:Sourcetype
DEST_KEY = MetaData:Sourcetype
REGEX = sourcetype::(.*)-too_small
FORMAT = sourcetype::$1

Michael
Contributor

Beatus, (hope you're still monitoring this thread...)

I love your solution. Our cluster has hundreds of these sourcetypes-too_small, and they're inconsistent, so any report based on them, or their root sourcetype fails often. It floors me that the sourcetype is pretty much always set to the same thing as the source, then when it fulfills this "too small" check, adds the -too_small to the source (and existing sourcetype name) -- when it would be better to simply leave the sourcetype name the same as source if it can't guess it -- rather than creating a completely new (and wrong) sourcetype. Oh Great Splunkin, are you listening?

So,
I've done the above -- and put them on my heavy forwarders (didn't work) and then put them on my indexers (deployed them in a bundle from my cluster master). But it did not fix the problem.

Would you happen to know if this fix still works -- or ever did?
Forgive my questioning, but so often I see people answering here with "this should fix it..." without ever actually having tried it and experienced it being fixed afterwards...

0 Karma

rbojja
New Member

I am not able to get this working.My case..I have Universal forwarder 6.0. ..and I see in the docs that structured data parsing is done at universal forwarder side.and did changes to props.conf and transforms.conf as above but I could still see too_small files at splunk enterprise.
My inputs.conf is something like this.
[monitor:///var/log/]
whitelist=.log$
recurse=true
Which monitors everything which ends with .log .
And props.conf settings are something like this
[source::/var/log/kafka/server.log]
sourcetype = kafka_server
[source::/var/log/kafka/state-change.log]
sourcetype = kafka_state
[source::/var/log/kakfa/controller.log]
sourcetype = test_controller
[[(?::){0}*-too_small]]
TRANSFORMS-remove_too_small = remove_too_small
and
transforms.conf is same as above

I see at splunk enterprise side..controller-too_small..which is the automatically assigned sourcetype by splunk for /var/log/kafka/controller.log.

Any help would be appreciated

0 Karma

marksedam
New Member

beatus,

Thanks for all the effort. Your answers have been helpful but I think I'm going to kludge my log files to ensure they contain a minimum of 101 events. The solution above I believe uses the filename (stripped of the '-too_small' text) for the sourcetype. My filenames are [hostname]_[type].log so additional work is needed. And it will break, I believe, if the files start as small then later grow and aren't caught up in this problem. This all seems extremely hacky just to work around the 'feature' of ignoring all the rules at input time for small files. I wonder if this behavior is a bug or an orphaned feature from an old version. I can't find any documentation why this substitution is occurring beyond when the log file contains 100 or fewer events it has -too_short appended to the filename. As if that is a reason.

I'll mark this as the answer as we have a couple of possible workarounds for this in here.

mark

0 Karma

beatus
Communicator

Marksedam,
The answer above uses whatever your sourcetype rule sets it to and just removes the "-too_small". So it's completely independent of the file name (unless your sourcetype is actually based off the file name).

Glad I could help either way, if there's more I can do please feel free to comment.

0 Karma

marksedam
New Member

I did some testing in a stand alone test environment v6.4.6 and found the learned app isn't controlled by enabling/disabling from the Managing Apps web UI.

With learned app disabled and the index cleaned when new logs are added the C:\Program Files\Splunk\etc\apps\learned\local\props.conf file gets updated with [filename]-too_small stanzas so disabling doesn't work even tho C:\Program Files\Splunk\etc\apps\learned\local\app.conf looks like:

[install]
state = disabled

so maybe disabling learned app might work but how?

0 Karma

beatus
Communicator

It won't fix it with the way you're doing things (the rule parsing). Typically when I see "*-too_small" it's from the learned app, but not in this case. Based on my testing this will work for you though:

There's no way to turn off the too_small behavior it seems, so we can deal with it at index time then. This won't be the cheapest possible way (in terms of CPU) to do so, but it should work for you.

props.conf:

[(?::){0}*-too_small]
TRANSFORMS-remove_too_small = remove_too_small

transforms.conf:

[remove_too_small]
SOURCE_KEY = MetaData:Sourcetype
DEST_KEY = MetaData:Sourcetype
REGEX = sourcetype::(.*)-too_small
FORMAT = sourcetype::$1

As for the difference between props and transforms, transforms exposes additional options props doesn't.

I've edited my original answer with this as well, so others can see what works without digging through our comments. Please accept it if you feel it was helpful!

0 Karma

marksedam
New Member

Thank you for your reply!

I tried the props.conf entry but that simply set the sourcetype of all of the small logs to "too_small" rather than "-too_small".

I diabled the learned app, stopped Splunk, cleaned the index and restarted Splunk and I still get all the small logs assigned a sourcetype of "too_small".

I also removed the "-too_small" stanzas from the learned app's local props.conf file with same result.

Any other ideas?

0 Karma

beatus
Communicator

Was the learned app disabled on the UF? That may be worth a shot.

The other option is to set a sourcetype for this monitor and be done with it if that's possible.

Last option is to do it dynamically based on something in the log or the path. Something like:

props.conf:

[source::/my/log/path]
TRANSFORMS-fix_st = fix_st

transforms.conf:

[fix_st]
REGEX = event_regex_here
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::my_new_st

You could make use of "SOURCE_KEY = MetaData:Source" if you'd like your regex to match on the file path.

0 Karma

marksedam
New Member

I'm currently assigning sourcetypes dynamically using rules in the C:\Program Files\Splunk\etc\system\local\props.conf (my test environment), e.g.:

[rule::MySourceType1]
sourcetype=my_sourcetype1
MORE_THAN_80=[regex here]
. . .

This works great for logs with > 100 events; if less it is ignored and either set to "too_small" or [filename]-too_small. Also once set to too_small it never will use the rule to assign sourcetype again even as the file grows > 100 events.

I'm not sure what the purpose of transforms is over props.conf

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...