Some background to this question:
"time"key in the event metadata. For TCP, I believe I'll have to configure timestamp recognition in
props.confas described in Splunk docs.
"sourcetype"key in the event metadata. For TCP, I believe I'll have to overrride source types on a per-event basis as described in Splunk docs. I do not want to use a different TCP port for each sourcetype.
So, I plan to create a stanza in
transforms.conf that gets a field value from the JSON-format data received via TCP, and uses it to set the sourcetype, like this:
[set_source_type_my_log_type] REGEX = \"somefieldname\"\:\"(?[^\"]+)\" FORMAT = sourcetype::$1 DEST_KEY = MetaData:Sourcetype
(I've tested this regular expression using the
rex command, but not yet in the context of overriding sourcetype; I don't yet know for sure whether I'll have to escape the double quotes, as done here.)
where the JSON received via TCP contains a field like this:
"xyz_123" is the sourcetype I want the event to have.
All of the above (thanks for reading this far) boils down to one simple question: what field name should I use in place of
somefieldname (as per the example above)?
It occurs to me that I probably shouldn't use the default field name
"Anything you like, except for one of the default fields (so, not
sourcetype)" might be a valid answer, but I'd prefer a more specific answer: an actual field name that other users in the same situation might also choose to use, as an informal convention (rather than the formalized EC protocol).
It also occurs to me that, given that I can supply both the
sourcetype and the
_time (expressed as Unix time) as fields in the JSON data, is there some better, more direct way than using regexes to configure the timestamp recognition and override the sourcetype? Specifying a regex to extract a JSON key value seems a bit like... inserting a key into a car door and turning the key, when you've got a remote unlock button on the same keychain. (Someone is going to lecture me on the data pipeline, and parsing versus search-time field extraction, and I probably deserve it.)
I'm not thrilled by having to pass through, as a field that will appear in the
_raw field of each event, a value that will also be represented in the
sourcetype field. That strikes me as inelegant. My EC-ingested events don't have such a field, and I'm hoping for my EC-ingested and TCP-ingested events to cohabit in the same indexes, so I'd prefer them to be as similar as possible. I'd appreciate advice on that, too.
I would first try
sourcetype; you may find that this does exactly what you would like it to do. If not, it will create a field called something like
something_sourctype which will be whatever the current Splunk code (i.e. developers) think should be named. Please do report back here which way it works and what the renamed/new field name is (if it does override the
sourcetype field, it should rename the original field as something like
Well, that was interesting. I defined a TCP input in
[tcp://:6666] index = test sourcetype = xyz
with a corresponding stanza in
[source::tcp:6666] INDEXED_EXTRACTIONS = JSON
As an initial test, I used a Windows PowerShell script to send a few events in JSON format, and confirmed in Splunk Web that the following search:
displayed the events, with the field names and values extracted from the JSON. So far so good.
Then I added:
to the JSON, and sent that.
The event appears in the Splunk Web Events tab with two values for the
xyz_123. There's no new or renamed field: just the one
sourcetype field with two values.
That event appears if I use the search cited above, but if I change the search to:
I get no results.
I'm now about to try overriding the sourcetype as described in Splunk docs. Just for fun, I might try doing that using a
sourcetype field, and see what happens: I wonder whether Splunk will "collapse" the two (now identical) values into one, or show them as separate values. Probably safer, though (since I don't understand the underlying code), to use a different field name.
I'm now overriding the sourcetype. Here's my working
[set_sourcetype_xyz] REGEX = \x22event_sourcetype\x22:\x22([^\x22]+)\x22 FORMAT = sourcetype::$1 DEST_KEY = MetaData:Sourcetype
\x22 is an escaped double quote)
sourcetype instead of
event_sourcetype as a field name in the JSON input data also works, but you end up with an ingested event with a
sourcetype field that has two identical values (for example,
I'm torn, and would appreciate advice on this. On the one hand, I'd prefer not to coin my own field name; on the other, I'm not comfortable with
sourcetype having two values. That just looks weird to me, and I don't have enough experience with Splunk to know whether this will bite me in the a...
I realize that I could save myself a heap of trouble here by using a single
sourcetype value for all of the different types of log records - all of which have different record structures - that are extracted by the platform-specific log extraction tool I referred to in my original question. And I could coin some new field with the unique values that would have been in
But I think that would be a "cop out"; an un-Splunk-y thing to do; in neither the spirit nor the letter of the Splexicon definition of source type.
There is also
sourcetype renaming that you might exploit. When you rename a sourcetype, the original value is moved to
Yeah, I read the Splunk docs topic on that ("Rename source types at search time") before asking this question. Problem is, that functionality is too limited to be useful in this situatoin: it only offers a one-to-one renaming, from the original sourcetype value to a different literal string value. Thanks for the suggestion, though.
Thanks also for prodding me to try
sourcetype and see what happens. I think that your answer, combined with this trail of comments, will prove useful to users with the same question, so I'm going to accept it.
I'm considering asking a new question, spawned by the testing I've done here, to ask about (re)using the EC protocol for TCP inputs.
What I'd really like is to use the same JSON for TCP input as I use for the HTTP Event Collector. That is, to specify
sourcetype as metadata keys, rather than having to write stanzas to configure timestamp recognition and override the source type per-event.