Getting Data In

How do I use requireHeader to override indexing settings for a TCP input?

The inputs.conf documentation describes a requireHeader setting for TCP inputs:

requireHeader = bool
Require a header be present at the beginning of every stream.
This header may be used to override indexing settings.
Defaults to false.

Where can I find more information, preferably with examples, on specifying a header for a TCP input, and using it to override indexing settings?

In particular:

  • Which indexing settings can I override, and how? For example, can I use requireHeader = true with some other setting(s) to override which index events (in subsequent lines in the stream) get stored in?

  • I'm already using a stanza in transforms.conf to override sourcetype per event. I haven't tried, but I think (based on the documentation I have read) that I can use the same technique to override the index per event, using DEST_KEY = _MetaData:Index. Typically, though, for my purposes, I'm more likely to want to override the index for an entire stream rather than per event, which is why a header-based override appeals to me. Will such per-event overrides take precedence over index settings overridden via whatever method(s) requireHeader = true involves?

1 Solution

SplunkTrust
SplunkTrust

requireHeader = True will look for this string in your incoming events.

     ***SPLUNK*** host=... sourcetype=... source=...

So to use this to change indexing behavior, assuming you have two apps inputing data to the same port.

app1: sends 
Line 1: ***SPLUNK*** sourcetype=app1
Line x: data

app2: sends 
Line 1:***SPLUNK*** sourcetype=app2
Line x: data

etc in order to make your indexer/ HF know what sourcetype to apply to each apps data.

Then in props you can use key overrides to change destination index based upon the sourcetype. You might even be able to put index=indexName in the header.

This will not work so well on UDP since everything is coming in on same port connectionless, but it should work perfectly for TCP inputs since each "stream" will have it's own header and the data wont get "mashed" together.

Its the parsing "header processor" that handles this, so maybe you can find documentation if you search for "header processor" and "parsing pipeline" etc.

Cheers,
Jkat54

View solution in original post

SplunkTrust
SplunkTrust

requireHeader = True will look for this string in your incoming events.

     ***SPLUNK*** host=... sourcetype=... source=...

So to use this to change indexing behavior, assuming you have two apps inputing data to the same port.

app1: sends 
Line 1: ***SPLUNK*** sourcetype=app1
Line x: data

app2: sends 
Line 1:***SPLUNK*** sourcetype=app2
Line x: data

etc in order to make your indexer/ HF know what sourcetype to apply to each apps data.

Then in props you can use key overrides to change destination index based upon the sourcetype. You might even be able to put index=indexName in the header.

This will not work so well on UDP since everything is coming in on same port connectionless, but it should work perfectly for TCP inputs since each "stream" will have it's own header and the data wont get "mashed" together.

Its the parsing "header processor" that handles this, so maybe you can find documentation if you search for "header processor" and "parsing pipeline" etc.

Cheers,
Jkat54

View solution in original post

Yes, that works. Thanks, Michael!

Re:

You might even be able to put index=indexName in the header.

Yes, that works, too.

Re:

maybe you can find documentation if you search for "header processor"

I found related documentation in the Splunk docs topic "Assign default fields dynamically".

I've added a comment to that topic questioning the use of the word "superseded".

Nit: that topic refers to ***SPLUNK*** rather than *** SPLUNK ***. That is, with no spaces between the word and the surrounding asterisks.


I had some issues implementing this that turned out to be my fault. I thought I'd describe those issues here, laying bare my mistakes :-), in case it saves anyone else making the same mistakes.

I added requireHeader = true to my existing inputs.conf stanza:

[tcp-ssl:6071]
index = test
sourcetype = abc
requireHeader = true
disabled = 0

(Yes, SSL. Before adding the requireHeader setting, I was successfully ingesting events using this input.)

I used the following openssl s_client command (again, I was using this successfully before adding requireHeader😞

openssl s_client -crlf -no_ign_eof -connect localhost:6071 -CAfile C:\PROGRA~1\Splunk\etc\auth\cacert.pem < tcp_request.json

to send the following two lines (the contents of tcp_request.json) to that TCP port:

***SPLUNK*** sourcetype=xyz_1234
{"time":"2016-06-28 12:00:00.000000","type":"ABC","code":"0000","data1":"payload"}

(I've previously successfully ingested events without that first "header" line.)

The splunkd log reported the following warning message:

Discarding 118 bytes of incomplete header data: ***SPLUNK*** sourcetype=xyz_1234\r\n{"time":"2016...

*** SPLUNK *** - that is, with spaces - resulted in the same warning, but for "120 bytes" - those extra two spaces.

Here's my mistake: I had forgotten to add a props.conf stanza for the xyz_1234 source type; in particular, with SHOULD_LINEMERGE = false. Here's the stanza:

[xyz_1234]
KV_MODE = json
SHOULD_LINEMERGE = false
TIME_PREFIX = {\"time\":\"
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%6N%:z

(I'm looking into using "|" as an "or" operator in the stanza to consolidate identical stanzas for different sourcetypes.)

SplunkTrust
SplunkTrust

That's awesome news and research. I updated my answer to remove the spaces too. That will reduce confusion for folks who skim answers and don't read the whole page! Folks like me!!!

0 Karma

From my question:

I haven't tried, but I think [...] that I can use the same technique to override the index per event

I've tried. Yes, you can. The following transforms.conf stanza overrides the index per event to match the value of a type property in the input JSON-format event:

[set_index_mysourcetype]
# Route events to type-specific index, or fall back to misc index
REGEX = \"type\":\"(abc|def)\"
FORMAT = $1
DEST_KEY = _MetaData:Index
DEFAULT_VALUE = misc

That is, if the input event contains the type property value "abc" or "def", the event is routed to the abc or def index, respectively. Otherwise, the event is routed to the misc index.

I'm still interested in an answer to the original question, though.

0 Karma

Splunk Employee
Splunk Employee

Hi @Graham_Hannington,

I’m not 100% sure this will work, but it seems that you might be able to use a SEDCMD regex replacement in props.conf to navigate through the header and locate/update the indexing settings that you want to override.

Here is the props.conf spec file:
http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf

And here is some related documentation on using a SEDCMD to anonymize data. I realize that this isn’t your exact use case, but I thought it might be helpfully similar.

http://docs.splunk.com/Documentation/Splunk/6.4.1/Data/Anonymizedata#Anonymize_data_through_a_sed_sc...

If it’s helpful, there is a related Answers post discussing the SEDCMD and regex here:
https://answers.splunk.com/answers/56186/sedcmd-to-strip-http-headers-from-raw-tcp-input-json-submit...

The accepted answer and comments might be useful to you.

Hope this helps! Please let me know either way and we can continue discussing.

0 Karma

Hi @frobinson,

Thanks very much for the suggestion. I did follow those links you supplied, but I ended up using the ***SPLUNK*** header as described in the answer by @jkat54.

Cheers,
Graham

0 Karma