I am wondering if one can relate extracted fields to a specific source rather than a source type. The event structure that comes in changes based on the source even though they are the same source type and so one extracted field will get the correct value and another will not if I don't specify the source I am looking for. I also have the problem where events of the same host, source, and source type will have different event structures from event to event. I am not sure how to extract the fields from this situation either.
Any suggestions are much appreciated.
If you look into props.conf documentation you'll see that you can modify splunk's behaviour (both index- and search-time) based on sourcetype, source and host matching. So technicaly - yes, you could do that.
Having said that...
In general, if there are multiple sources producing events of the same sourcetype, the events should have common format. That's what sourcetypes are for.
If you just have optional fields which some sources/hosts can produce and some not - you should rather reconsider your fields extractions as a whole and account for those fields, possibly with some fancy conditional regex branches and so on.
But if your sources produce completely different event formats... Hey, that's a different sourcetype. Maybe just define some sources with one sourcetype and others with another.
If you start extracting by source or host you're gonna quickly find yourself in a pitfall of props.conf entries priority issues some settings overlapping over other settings. You don't want that if you can avoid it 😉
If you look into props.conf documentation you'll see that you can modify splunk's behaviour (both index- and search-time) based on sourcetype, source and host matching. So technicaly - yes, you could do that.
Having said that...
In general, if there are multiple sources producing events of the same sourcetype, the events should have common format. That's what sourcetypes are for.
If you just have optional fields which some sources/hosts can produce and some not - you should rather reconsider your fields extractions as a whole and account for those fields, possibly with some fancy conditional regex branches and so on.
But if your sources produce completely different event formats... Hey, that's a different sourcetype. Maybe just define some sources with one sourcetype and others with another.
If you start extracting by source or host you're gonna quickly find yourself in a pitfall of props.conf entries priority issues some settings overlapping over other settings. You don't want that if you can avoid it 😉
Thank you for your reply. One source produces a over 150 events per minute. Each of these events have a time and only single value that is shared between them. Depending on the value of this shared field, the rest of events fields and values change. Do you have any resources or suggestions regarding how to tackle this problem with a regex as I am not sure if I can change how the events are structured due to other systems that rely on them?
Events do not have to contain the exact same fields to be of the same sourcetype. A sourcetype refers to a "kind" of event - usually those with the same structure, which can be parsed with a given set of rules/expressions.
Just so you don't put too much effort into assigning sourcetypes/sources/etc. 😀
150 events per minute is just 2.5 eps (events per second). Nothing to write home about 😉
It's hard to give you a precise answer without knowing the events and their structure but it's not unusual for a sourcetype definition to include transforms "recasting" events to another sourcetype.
So that even though you're initially ingesting all events in one stream as - let's say - syslog:myapp during indexing, depending on some regex the sourcetype is being rewritten to syslog:myapp:network, syslog:myapp:perf, syslog:myapp:whatever...
Then you can have a separate search-time set of extractions for each of those "subsourcetypes".
It's a very typical approach - many TA's use it.
@PickleRick Thank you again for the response. I know 2.5 eps isn't much, I was more just meaning that this one source has a lot of different variations for being the same source and source type. Speaking of which as an example of the more exact layout of the event I am working with:
(As seen in the raw form)
Event 1:
2022-06-30 15:21:10,002 Monitor=Memory TotalMemory=123456 MaxMemory=1234567
Event 2:
2022-06-30 15:21:10,001 Monitor=Service State=Running Min=0 Max=220
Where "Monitor" is the only field that always appears in the events and the rest of the fields change based on the value of "Monitor" . Also the host, source, and source type are all the same for the events.
With this example how would you go about setting up a regex so that the fields are properly separated?
Firstly, that's a simple key-value format which splunk should detect on its own.
But if it doesn't you even have an example to extract key=value pairs in splunk's transform.conf documentation at the end of REGEX option description.
I have only been using Splunk for a few days. I am normally a quick learner but I must say, Splunk is really testing that Idea. Haha
Maybe because I just don't understand REGEX's all that well yet, I just wanted to restate my issue in a different way to confirm that delving into the transforms.conf doc is what I need to do.
My main problem is that when I try to generate a field and use a regular expression to do so, the first field, which is the same across all of the events, works as expected and has the correct values and just as importantly the correct field name. But the following fields may be correct or may not be because the regular expression made through the field generator is just looking for the "next" key=value pair instead of changing which field or key it is looking for based on the first key=value pair it read. So it will read the next key=value pair but it won't have the correct field name.
So is there a way I can change which fields are being "parsed" based on the first key value pair with REGEX's?
Additionally , do I need just to be thinking about this problem in a different way entirely and am making things more complicated then they need to be? My apologies if I am being dense here.
I really appreciate the help though @PickleRick @richgalloway @m_pham
Given your example events and as @PickleRick has already said, you don't have to do anything to parse the fields in your data. Splunk will parse key=value data automatically.
If you insist on using a regex on the data then you don't need to know what fields are present in the event. The following transform will parse any number of key=value pairs, create fields with the key name, and give that field the matching value.
[mytransform]
REGEX = ([a-z]+)=([a-z]+)
FORMAT = $1::$2
REPEAT_MATCH = true
Invoke the transform from props.conf:
[mysourcetype]
TRANSFORM-extract_fields = mytransform
I believe I understand now. Thank you @richgalloway and @PickleRick
If your problem is resolved, then please click the "Accept as Solution" button to help future readers.
Hi - you just need to use props.conf to specify the source and your field extraction configurations.
For settings that are specified in multiple categories of matching [<spec>] stanzas, [host::<host>] settings override [<sourcetype>] settings. Additionally, [source::<source>] settings override both [host::<host>] and [<sourcetype>] settings.
https://docs.splunk.com/Documentation/Splunk/latest/Admin/propsconf
Example:
[source::<your_source_here>]
<field_extraction_stuff_here>
Typically, field extractions are listed in a props.conf file stanza by sourcetype like this:
[mysourcetype]
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)
...
But you can specify them by source or host, instead.
[source::/some/file/path]
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)
SOURCETYPE = foo
...
[host::<hostname>]
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)
SOURCETYPE = bar
...
See props.conf.spec for more information.