On HF we have routing summaries in transforms.conf which are take more time and creating a bottleneck for us
We have below number of routing summaries
~2000 entries for index routing
~200 entries for sourcetype routing
Can you please provide suggestions to route the events faster and efficiently.
Sample from transform.conf.
[route_sentinel_to_index]
INGEST_EVAL = index:=case(\
match(_raw, "\"TENANT\":\"xxxxxx-b589-c11a968d4876\""), "nacoak_mil", \
.
.
.<1997 entries>
.
.
match(_raw, "\"EVENT_TIME\":\"\d{13}\""), "unknown_events", \
true(), "unknownsentinel")
[apply_sourcetype_to_sentinel]
INGEST_EVAL = sourcetype:=case(\
match(_raw, "\"SYSTEM\":\"xxxx-b3a7-xxxxxx\""), "cs:fhir:prod:audit:json", \
match(_raw, "\"SYSTEM\":\"xxxxxxx-d424c20xxxx\""), "cs:railsapp_server:ambulatory:audit:json", \
.
.<198 entries>
.
true(), "cs:sentinel:unknown:audit:json")
Why such large case statements? Can the assignment of index and sourcetype be moved to inputs.conf?
If not, then can you add more HFs and partition the input among them?
This is way it written before to handle few case statements , but the list keep on growing.
we have set up batch to read the files on UF and each event is getting evaluated with case to ingest to index with right sourcetype.
do you mean moving this inputs.conf on UF and does not make huge inputs.conf if we move ?
we have 5 UF and 2 HF , I have seen the issue added 2 HF , so now 5 UF and 4 HF even then it not ingesting fast enough.
Having UFs specify the index and sourcetype of each input is standard practice. Will it make for huge inputs.conf files? Maybe. I don't know what your current inputs.conf looks like, but there's little harm in having large ones. The UF will be monitoring the same files, anyway, only now a lot of work will be shifted off the HFs.
Below is inputs.conf on UF
[batch:///flume/rollingfiles/process]
_TCP_ROUTING = prod-a_hf
crcSalt = <SOURCE>
disabled = false
index = unknownsentinel
move_policy = sinkhole
recursive = false
sourcetype = cs:sentinel:unknown:audit:json
whitelist = \.rd$
UF will do any transformation of events, so it has to go through the HF / Indexer, which would be same issue we are seeing now right . Please let me know If am missing anything
I way its setup now is
UF ---> HF ---> Indexer cluster
UF --> batch monitor to to read the files with multiple events
HF --> to route each event based of tenant Id to specific index and sourcetype
Hmm... So all monitored files are in one place and it's up to the HF to figure out where everything goes? That's a non-scalable solution.
Is there a faster way to decide where the data goes? Perhaps by examining the host or source rather than the contents?
If that won't work and you don't want to add more HFs (maybe 4 isn't enough for the job) then consider installing Cribl.
Yes, all data is one place, HF does routing .
We are trying see if moving the routing to indexer helps and also we are trying to find a scalable solution.
The only way as of now is to view contents. I will add couple HF see if that helps. Thank you!