Good day, sirs!
What system resource do I need to increase to increase the speed of parsing of my Heavy Forwarder? My instance uses 'batch' monitoring to monitor 21 folders for ingestion but it seems it can't keep up because the files are getting piled in their respective folders waiting for their turn to be parsed and ingested.
The process would be:
CRON that extracts the file from archives and send to 21 folders(different sourcetypes) -> heavy forwarder monitor those folders -> parse -> send to the indexer.
Please enlighten me. Thank you!
First I would like to point that you can achieve this using UF, if you do not want to do masking or any other parsing at source then HF is not required, HF will increase network traffic because it will add metadata after parsing so I'll suggest to use UF.
How many files present in those 21 folders ? If thousands of files are present then I'll suggest to reduce number of files in those folders.
Are you doing any props - transforms to parse those data ? If yes then writing good regex will improve parsing.
Thank you for your inputs.
The number of files on those folders depends on how fast the HF parses each file(around 4~6 MB). I used 'batch' instead of 'monitor' on my inputs.conf to monitor those folders sir.
Sir? Do you have any suggestion on what system resource do I need to increase to fasten the parsing speed on HF? Is it CPU? Memory? or IOPS?
Please enlighten me more. Thank you!
As you are using batch stanza so it will remove file from folders once it will read content of the file, still my question remains same. When cronjob extract file from archive how many files (hundreds, thousands ?) it will extract in those folders ?
Are you doing any parsing on those data before forwarding to Indexer ?
Have you monitored your CPU, RAM usage on HF ? Have you checked any error or warning in splunkd.log ?
Oh.. I see. At best, it should be 5000 files(4~6mb per file) per folder per day. I checked that it heavily uses RAM during the parsing in HF before sending them to the indexer. I'll try if I can send it as RAW and make the Indexers do the parsing instead. But sir, does increasing the RAM makes the parsing faster?
There was no warning in splunkd.log so far.
I tried testing with 5000 files per folder and it took the heavy forwarder almost an hour to send all of them to the indexers.
Looking forward to your further inputs sir!
5000 files in 1 folder so total it will be 100K for 21 folders in a day, with batch input it should not be a problem.
But based on file size, lets say avg size 5MB/file then it will 512GB/day total which is good number to process by one Heavy Forwarder but again this is purely depend on Heavy Forwarder server specification. I'll suggest to switch to UF and then let Indexer do parsing. Here also if you have 1 or 2 Indexer with recommended server specification then as well it will delay indexing because one indexer can generally index 100-150GB/day if you are not using premium application like ES and ITSI.
This can be easily achived by installing Universal forwarder. If you need a real-life example, We had a 12 Core Blade, with 24GB , RAM, 800IOPS monitoring approximately 3000+ folders/directory and sending to indexer. the Avg cpu/memory usage is about 20-30%
There will be a hickup at start, but later it will be quite smooth.
We have similar HF of hardware specs, but the data load is smaller in that. So really can't put true values, but the cpu/memory usage is still within limits.
Also it depends on how many indexers you have. Invest more into INdexers and sending to multiple indexers makes searching/indexing much faster.
I'll add a bit more to this.. As already mentioned, if you're just batching files and not doing parsing / masking / editing of the data, then you should look at using a UF instead of a HF.
Additionally, assuming that your hardware does have capacity available, as in CPU and Memory, you can increase the ingestion pipelines on the forwarder to 2, 3, or even more. Read here for more details on this : https://docs.splunk.com/Documentation/Splunk/7.2.6/Indexer/Pipelinesets
It's important to understand that you effectively double your ingestion capacity for the pipelines added. This could have upstream impact if your intermediate and indexing tiers are experiencing any type of latency or resource related issues.