Getting Data In

What is best way to use sourcetype with HTTP Event Collector to categorize data?

Contributor

From the HTTP Event Collector setting page:

Source type
The source type is one of the default fields that Splunk assigns to all incoming data. It tells Splunk what kind of data you've got, so that Splunk can format the data intelligently during indexing. *And it's a way to categorize your data, so that you can search it easily. *

We are inputting key/value pairs via HTTP Event Collector. We are currently using sourcetype as a way to categorize the type of data associated with the key/value pairs. We could also add a key with the type of data.

Is using sourcetype to categorize data a good practice? Or should we not set the sourcetype for our HTTP Events and set a key value?

0 Karma
1 Solution

Splunk Employee
Splunk Employee

The main value of sourcetype is you can associate different processing rules that will run either at index or search time based on the sourcetype. So in your case if you think you might want to be able to associate different rules for diff categories, then diff sourcetypes make sense, vs a single sourcetype. Having a single sourcetype and using a category field for example, will allow you to have one set of specific rules for all your data.

If there are no rules period, then it really doesn't matter which way you go.

View solution in original post

Path Finder

@simpkins1958 would you mind sharing your httpevent stream code. we are trying to push the code via stream, and we are not able to setup the sourcetype and source. It is taking default values as http-stream-too_small or http-stream?

0 Karma

Splunk Employee
Splunk Employee

The main value of sourcetype is you can associate different processing rules that will run either at index or search time based on the sourcetype. So in your case if you think you might want to be able to associate different rules for diff categories, then diff sourcetypes make sense, vs a single sourcetype. Having a single sourcetype and using a category field for example, will allow you to have one set of specific rules for all your data.

If there are no rules period, then it really doesn't matter which way you go.

View solution in original post

Splunk Employee
Splunk Employee

I would qualify that by saying that sourcetype is an indexed field, so if you have a good amount of different sourcetypes, using that field when searching will improve search performance, compared to using an event-level key/value pair that is extracted at search time.

Splunk Employee
Splunk Employee

It's a common misconception that indexed fields have notably different performance characteristics from text tokens. They don't. We look them up the same way. Indexed fields only behave notably differently when the field name and value together are drastically less common than the value alone.

However, the fields source, sourcetype, and host in Splunk are afforded a fairly special place and afford much more powerful abilities to apply implicit processing by data category, among other things. sourcetype is best thought of "a type of data", such as the kind of data produced by a particular application, or for complex applications one type of datastream it produces. Something that you can create a rich configuration to automatically extract further data from by its format and structure.

Splunk Employee
Splunk Employee

@ssievert that's a good point!

0 Karma

Contributor

Thanks. We will be using sourcetypes for our categories.

0 Karma

Contributor

Thanks Glenn.

0 Karma