Getting Data In

What is the best way to match more than 3000 patterns used to classify events into multiple sourcetypes?

vganjare
Builder

Hi All,

We have more than 3000 patterns which are used to classify events into multiple sourcetypes. What is the best way implementing this use-case?

Thanks,
Vishal

0 Karma
1 Solution

dmaislin_splunk
Splunk Employee
Splunk Employee

Try reading this blog as it will help you understand how to re-write sourcetypes based on pattern matching using transforms and props in Splunk:

http://blogs.splunk.com/2010/02/11/sourcetypes-gone-wild/

View solution in original post

0 Karma

meenal901
Communicator

Adding more details:

We have 4 sourcetypes: .log, .out, .debug and .err
Phase-1: Apply 1000 patterns to filter out unwanted data using NullQueue and IndexQueue on Heavy Forwarder-
[log]
TRANSFORMS-set = setnull,setraw

setnull=.
setraw=1000 patterns

Phase-2: Apply Escalate, Non-Escalate patterns to this output using EXTRACT keyword in props.conf to in respective fields.
EXTRACT-esc=1000 patterns
EXTRACT-nonesc=2000 patterns
In this case, will there be any performance issue on Heavy forwarder and Indexer? Can Splunk handle 3000 patterns at search time and 1000 at parsing time?

0 Karma

vganjare
Builder

We are evaluating the regex pattern matching and sourcetype configurations (using props.conf + transforms.conf), but we are not sure if 3000+ patterns will be supported or not, and if supported, then what will be downside impact on forwarder/indexer. Will there be any performance impact due to (3000+) pattern matching?

0 Karma

vganjare
Builder

We want to set the metadata sourcetype at index time.

0 Karma

dmaislin_splunk
Splunk Employee
Splunk Employee

And you want to set the metadata sourcetype at index time or are you ok with using an eval statement to set fieldname called escalated_exceptions=true for example? Sorry I am not quicker to respond, but I am in all day meetings this week off site.

0 Karma

dmaislin_splunk
Splunk Employee
Splunk Employee

Try reading this blog as it will help you understand how to re-write sourcetypes based on pattern matching using transforms and props in Splunk:

http://blogs.splunk.com/2010/02/11/sourcetypes-gone-wild/

0 Karma

Runals
Motivator

I second the idea of using eventtypes

0 Karma

dmaislin_splunk
Splunk Employee
Splunk Employee

If I were you I would just put everything into a few sourcetypes and use eventtypes vs. spending all those resources to rewrite the sourcetype.

0 Karma

dmaislin_splunk
Splunk Employee
Splunk Employee

Correct, follow the example and blog above and at index time you can dynamically rewrite the sourcetype based on the patterns you define as the REGEX in the transforms.conf file.

0 Karma

dmaislin_splunk
Splunk Employee
Splunk Employee

The blog I referenced above gives a great example on how to rewrite sourcetypes based on regex patterns:

In props.conf:

[source::/path/to/sample.log]
TRANSFORMS-yummy = setCPSourcetype, setSyslogSourcetype

In transforms.conf:

[setCPSourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = %PIX-
FORMAT = sourcetype::cisco-pix

[setSyslogSourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = \w+ \d+ \d+:\d+:\d+ \S+ \w+[\d+]:
FORMAT = sourcetype::syslog

0 Karma

vganjare
Builder

We are having an application which generates different logs. The logs contains different exceptions from Java application and these exceptions are classified into four different categories:
1. Escalate exceptions
2. Non-escalate exceptions
3. Ignore exceptions
4. Unmatched exceptions
e.g. Anything starting with NullPointerException should go to Escalated exceptions (and escalated exceptions sourcetype)

0 Karma

dmaislin_splunk
Splunk Employee
Splunk Employee

Can you elaborate and provide some examples? I am unclear about what the issue is and what has been tried to date.

0 Karma