Hi.
I have a business requirement where I need to index data from multiple of our vendors that also use Splunk.
The vendors have added a _TCP_ROUTING to send data to both our Heavy Forwarders and their own infrastructure.
I have a dedicated port for each vendor in my inputs.conf on the Heavy Forwarder:
[splunktcp-ssl:9997]
disabled = 0
_meta userindex::splunk_testMy idea was to have a different userindex for each input stanza
Next step is a generic props.conf:
[host::*]
TRANSFORMS-force_index = force_index
Finally I was hoping it would be possible to do the magic in my transforms.conf:
[force_index]
DEST_KEY = MetaData:Sourcetype
REGEX = (.+)
FORMAT = $1
SOURCE_KEY = _meta:userindex
WRITE_META = trueI know I'm not rewriting the index, but it is easier to look at the sourcetype, as the events get indexed and it should be a small change to rewrite the index instead of the sourcetype.
Long story... so to the question.
Is it possible to reference the _meta variable I have set in the input stanza in the regex of the transform on the same Heavy Forwarder?
Kind regards
Lars
P.S.
I agree it is a bad idea to rewrite the index, it should be set at the source, but I think it is necessary, as our indexes do not match those of our vendors and I want each vendors data to be indexed in the same index.
I'm facing the exact same issue for the exact same requirement. Did you ever find a solution over the last 6 years?
Hi @lorenzoromio .
The underlying problem is that normal behavior for a UF, is to send cooked data, and as the data is allready cooked, your HF will take what is sent to it as gospel truth.
Whar I ended up doing was to use ingest eval and build some rather long case() expressions to ensure data is stored in the right indexes.
Ingest actions could also be a possibility now.
If I had to do this again I would also look at Edge processor
Kind regards
las
@las Your response is not correct. Or - to be more precise - it's partially correct but you draw wrong conclusions from it. Yes, UF by default sends cooked data but it is only cooked, not cooked and parsed (unless you're using indexed extractions but that's a story for another day). Cooked data _is_ getting through all normal phases of processing - line breaking, timestamp recognition and so on.
So your ingest evals will only work if you're indeed getting unparsed data from UFs. It will _not_ work if you're getting parsed data from other HFs (or SHs, or a copy from indexers) or parsed data from UFs which has undergone indexed extractions.
BTW, ingest actions are RULESETs just with added fancy UI.
It is a quite old thread but there are some things which can be said.
In general, I believe that yes, you should be able to reference a field you're setting on input. But you might need to just use SOURCE_KEY=userindex instead of referencing whole _meta.
The main problem here is that if your data comes in as parsed, it won't be touched by TRANSFORMS. You need to use RULESET. And you need to define a ruleset for [default] stanza (ok, you might do a wildcarded host stanza as well but somehow I find it less easy to read but YMMV) because you have no control of what the source is sending. That's just one of the assumptions with s2s - you implicitly trust the sender and trust that the metadata is correct.
Thank you very much, the ruleset worked. Currently, I have configured it as follows:
inputs.conf
[splunktcp://9996] disabled = false connection_host = ip #acceptFrom = 10.128.21.11, 10.128.21.12, 10.128.21.13 _meta = company::smartcity splunktcp::splunktcp://9996
props.conf
[default] RULESET-ruleset_test_syslog_firewall = _rule:ruleset_test_syslog_firewall:set_index:smartcity
transforms.conf I created an ingest eval that saves the original index into an original_index variable, and then alters the index field, making sure to act only if the splunktcp meta exists and is set to "splunktcp://9996" (which is set in inputs). (Line breaks added for readability):
[_rule:ruleset_test_syslog_firewall:set_index:smartcity] INGEST_EVAL =
original_index:=index,
index:=case(
index="axonius" AND splunktcp="splunktcp://9996","main",
index="traffic_a2a" AND splunktcp="splunktcp://9996", "main",
splunktcp="splunktcp://9996","custom_support_index",
true(), index
)
I was wondering, is it absolutely mandatory to set the stanza as [default] in props.conf, or is there a way to narrow the scope and have that ruleset apply only to events received via the splunktcp protocol?
I would like to avoid overloading the HF (Heavy Forwarder) by applying the rule to all incoming events.
Thank you very much!
The stanzas in props.conf can be defined for sourcetype, host or source only. So no "this output" stanza. That's why the default stanza. If you are sure that the data will come only with specific sourcetype, host or source (and can trust the other side that it won't send anything else), you could narrow it down to specific ones.
If you are at 10 already you could look into Edge Processor for alternative solution (but that will require at least one additional separate worker machine).
Thank you for the response, appreciated.
Actually, my situation is slightly different.
As mentioned before, I need to receive data from a Splunk instance that we do not control, which is already indexing the data in its own on-prem Splunk environment.
The issue is that the index names they send their logs to are generic, whereas I need to name the indices using a specific nomenclature in my Splunk istance.
Therefore, I need to intercept their logs and modify the index name that arrives via the cooked data.
To test that configuration, I performed a local test.
I created a TCP input on port 19999 configured as follows:
[tcp://19999] disabled = false connection_host = ip index = axonius sourcetype = test_syslog_firewall _TCP_ROUTING = test_loopback_9996
A second input configured like this:
[splunktcp://9996] disabled = false connection_host = ip _meta = provenienza::smartcity
An outputs.conf configured to forward the logs back to itself on another port, with sendCookedData set to true:
[tcpout:test_loopback_9996] server = 127.0.0.1:9996 # Forcing transmission in standard Splunk ("cooked") mode sendCookedData = true
After that, I configured a props.conf:
[splunktcp] TRANSFORMS-force_index_main = check_tag_and_set_main_index
And the corresponding transforms.conf:
[check_tag_and_set_main_index] SOURCE_KEY = _meta REGEX = provenienza::smartcity DEST_KEY = _MetaData:Index FORMAT = main
Then I sent a log to port 19999. The log is indeed successfully routed to test_loopback_9996, a behavior confirmed by the fact that it gets assigned _meta = provenienza::smartcity, but then I cannot get it to change the index. It gets ingested into the axonius index instead of main.
I believe the stanza [splunktcp] set in props.conf is not working as expected. Could you give me some tips?
Furthermore, while this approach might work for changing the index arbitrarily, what I actually need is to be able to intercept the logs arriving on splunktcp:9996 and dynamically route them.
For example:
If the incoming index is windows, change it to windows_smartcity.
If the incoming index is linux, change it to linux_smartcity, and so on.
Right now, I am not sure how to achieve this because my current transform try to intercepts a metadata field (_meta) to change the index. Instead, I need to intercept the incoming index name itself and modify it based on its original value but only for the events arriving through that specific splunktcp port.
Hi @lorenzoromio,
Expanding on @PickleRick's suggestions, you can brute force events through parsingQueue to open up not only typingQueue behavior but all functions you would expect a heavy forwarder or receiver to perform, irrespective of the event disposition. This is my preferred solution for handling events from an external forwarder in a pure Splunk environment.
In an app or in $SPLUNK_HOME/etc/system/local/inputs.conf, override the [splunktcp] stanza route setting, and change has_key:_linebreaker:rulesetQueue to has_key:_linebreaker:parsingQueue:
[splunktcp]
route=has_key:_replicationBucketUUID:replicationQueue;has_key:_dstrx:typingQueue;has_key:_linebreaker:parsingQueue;absent_key:_linebreaker:parsingQueueYou must still be mindful of the structure of received events, but all props.conf and transforms.conf settings will be available to you.
Better practices using EVENT_BREAKER etc. at the forwarder do not change, but since the forwarder is outside your control, you'll have more flexibility in vetting and parsing events.
You're touching an interesting topic here.
Have you actually tried moving the processing as far back as parsingQueue? Intuition hints that since the parsed data is already in UTF, the input stream has already been split into single events, parsingQueue shouldn't actually do much at this point. Maybe some metrics manipulation.
I also wonder how would re-merging events in aggQueue affect already created indexed fields...
As I already wrote (albeit in a shorter form) - if you're receiving data which has already gone through a full Splunk Enterprise instance (indexer, HF...) or comes from a UF but has been ingested with indexed extractions configured, it is already parsed and will _not_ go through your transforms. You need to use RULESET instead of TRANSFORMS to process them.
EDIT: Alternatively you could probably use the "route" option in your input and send all data again to typingQueue but this is not a very well docummented option and it might not be easy to debug should anything go wrong.
Anyway, each of those methods has its pros and cons. If you use RULESET, your data will only hit that ruleset (and maybe other ones you have defined). If you send the data to typingQueue your data will get affected by index-time operations defined by your add-ons which might or might not be what you want.