Hello all, I'm new to Splunk, so please bear with me as I ask a really n00bish question.
Is it necessary to define your fields ahead of time?
For example, say I have a collection of attributes that I want to log. Let's say the attributes are arbitrary in number, anywhere from 20 - 40, and I don't know what they will be ahead of time. So, for example, sometimes the key "foo" will be part of the collection, and sometimes it won't be.
The attributes take the form of key/value pairs. What I want to do is write this collection to a log, and then be able to use all of Splunk's standard features to search my logs based on these keys.
From the little that I know about Splunk, I get the feeling that I want these keys to be fields. But if I have to know what these fields are ahead of time, and they have to be present in every log line, that fouls up my whole plan.
And so I ask unto you, oh wise Splunkfolk, what's the best way of going about this?
I guess I could just put the whole collection into one big freetext field, but if I did that, wouldn't I miss out on a bunch of Splunk's best features?
Thanks for the help and patience.
There are two types of fields in Splunk, "indexed" and "extracted".
Indexed fields are stored in the index when the data is stored into Splunk - these do have to be defined in advance.
Extracted fields are pulled from the event "on the fly" during a search operation - these do not have to be defined in advance, and can be added to / altered for data already stored in Splunk's indexes.
These are often referred to as "index time" and "search time" extractions.
For the vast majority of operations, extracted fields are recommended because they are far more flexible and perform well enough. There are some specific recommendations as to when to use extracted vs indexed fields in the docs at places like:
so the key/value pairs are extracted automatically by splunk. basically anything that has in the logs, username=user123 role=role1 will be picked up as fields (ie, username and role will be fields and user123 and role1 will become the values).
If however your delimiters are different (like double colon :: or something else) , you might want to define them prior to starting indexing.
Another thing you can do is have a few sample logs be indexed and train your splunk to recognize the fields that you want either by regexes within props.conf/transforms.conf or within splunkweb.
Thank you both for your suggestions! I wish I could pick both as "right answers." I'm going to try and use "key=value" so they get picked up automatically, but I'll definitely keep extracted fields in mind.