Hi developers, I am trying to analyse some logs by extracting them in JSON format and feeding to splunk.
I have millions of these logs each resulting in a JSON file of 4-5 kb.
How to monitor these files effectively so that spunk picks up each file.
Thanks.
A major issue can be the ulimits
for open files. Read please the great post by @yannk at how to tune ulimit on my server ?
I see 2 main options:
I don't have experience myself with such huge amounts of files, but unless you get some specific recommendations here, I'd suggest to just give it a try (in a test setup ideally of course) and see what issues you run into. Then you can always post back here to get help resolving those issues.
hi @FrankVl, I tried HTTP Event Collector method and found it to be useful.
Now the issue is i have to run curl command for each files. On a daily basis i get millions of files to process so would it be an overhead to run curl so many times?
I also have an idea of merging all the JSON records into one file seperated by [EOF] and send that file across to splunk and break events using [EOF].
But its not getting inputted into splunk as [EOF] is not in JSON format.
Any other solutions??
Don't think curl should give too much overhead, but you should be able to see that for yourself whether it causes problematic load.
As per your other idea: I don't completely follow what you tried and what is failing.