Forgive me if this has been answered before but my googling has failed me -
I have a forwarder that batches log files to our indexer. The sourcetypes are set on the forwarder in the inputs.conf file. I need to drastically change this and split one sourcetype into many based on log file name. This will require me to make around 6~ sourcetype entries PER INDEX (we have about 20) in the inputs.conf file.
Before I make any big changes, I was wondering if there was an easier or better way of doing this. I simply do not understand where all of the places to create a sourcetype exists and why. For instance, when I google how to make a sourcetype it tells me to edit props.conf.... what?
The data in question is very large. I think it's much too large for an index time transform on the indexer side but I Do not understand the strain of the transorms if any in the first place. What are my options here?
In inputs.conf you tell the data what sourcetype it should take, in props.conf you define settings for that sourcetype such as event breaking, timestamp extraction, etc.
The settings made in inputs.conf can be overridden during parsing, those settings live in props.conf and transforms.conf - for example, consider this:
props.conf
[source::.../my_awesome_file.log*]
TRANSFORMS-set_awesome_sourcetype = set_awesome_sourcetype
transforms.conf
[set_awesome_sourcetype]
REGEX = .
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::awesome
That'll set sourcetype=awesome
for any file name starting with my_awesome_file.log
- depending on your situation, setting this during parsing can be an option. I'd recommend to fully explore setting things properly during input already.
If you're worried about indexing performance, don't be. First, a single reference machine can easily sustain 20MB/s indexing rate - search load is what kills you down the line, rarely indexing. Second, given that you probably don't know much about props.conf there's a lot of performance to be gained from defining sourcetype settings such as event breaking efficiently, much more than some sourcetype rewriting would usually cost. Here's a good overview: http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Overviewofeventprocessing
Remember, already-indexed data won't change.
In inputs.conf you tell the data what sourcetype it should take, in props.conf you define settings for that sourcetype such as event breaking, timestamp extraction, etc.
The settings made in inputs.conf can be overridden during parsing, those settings live in props.conf and transforms.conf - for example, consider this:
props.conf
[source::.../my_awesome_file.log*]
TRANSFORMS-set_awesome_sourcetype = set_awesome_sourcetype
transforms.conf
[set_awesome_sourcetype]
REGEX = .
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::awesome
That'll set sourcetype=awesome
for any file name starting with my_awesome_file.log
- depending on your situation, setting this during parsing can be an option. I'd recommend to fully explore setting things properly during input already.
If you're worried about indexing performance, don't be. First, a single reference machine can easily sustain 20MB/s indexing rate - search load is what kills you down the line, rarely indexing. Second, given that you probably don't know much about props.conf there's a lot of performance to be gained from defining sourcetype settings such as event breaking efficiently, much more than some sourcetype rewriting would usually cost. Here's a good overview: http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Overviewofeventprocessing
Remember, already-indexed data won't change.
Yeah, the sheer amount of possibilities in Splunk can be overwhelming.
The default unit for indexing performance is GB per day per indexer, 500gb/day usually overwhelms one indexer but usually bores ten indexers.
However, the biggest impact on how much any given hardware can take is search load. Each event is indexed once but searched for, or at least considered to be searched, countless number of times by searches you run. Doing a bit more at index time rarely is an issue.
That being said, it's of course possible to shoot yourself in the foot at any time - for example with less-than-ideal regular expressions running over large sets of data. Matching for your source path isn't one of those cases, you'll be fine (until proven otherwise by actually trying it). After making change, check out the indexing performance dashboards in the distributed management console to look for changes in the CPU usage by the various processes.
Thanks again for the extra clarification! I'll keep and eye on the performance dashboards.
Thanks. I have added break lines and transforms before. I was mostly confused about all of the areas sourcetypes can truly be set since there seems to be many different ways.
If I'm hearing you correctly, are you saying index time transforms are really not an issue for about 500gb a day? If this is the case, it would be easiest for me to add the new, chopped up sourcetypes in props.conf and transforms.conf like you provided. Would this have any effect on search time performance one way vs the other?