About mik_cox

mik_cox · ‎08-10-2016

Answering my own question: My major problem was that I had put the following properties... vix.input.1.et.format = yyyyMMddHHmm vix.input.1.et.offset = 0 vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*? vix.input.1.lt.format = yyyyMMddHHmm vix.input.1.lt.offset = 60 vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*? ...on the provider, NOT on the virtual index as they should've been (in indexes.conf). Setting these properties up through the Hunk web interface on the Virtual Index editing page would've configured this properly.

mik_cox · ‎08-09-2016

I have a Hunk installation that is successfully (albeit slowly) pulling data from an s3:// filesystem. However, I'm having problems getting Hunk to only search relevant directories in s3. I see the correct results when running a search over a specific time range in the Hunk UI, but Hunk is still searching over all files in Hadoop to do so which is slow and wasteful. For instance, I have my data in directories in s3 that follow this format: s3://my-bucket/data/appname/2016/08/09/22/appname_22_30.log which would correspond to the logs from my app that were collected on August 9th, 2016 for the minute of 22:30. I have correspondingly set up my provider with the following properties: vix.input.1.et.format = yyyyMMddHHmm vix.input.1.et.offset = 0 vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*? vix.input.1.lt.format = yyyyMMddHHmm vix.input.1.lt.offset = 60 vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*? When running searches, I've noticed in my search.log that I get lines like this... DEBUG ERP.s3-emr - VirtualIndex - File meets time heuristic path=s3://my-bucket/data/myapp/2016/08/02/11/myapp_11_40.log, search.et=1470009600, search.lt=1470268800, file.et=0, file.lt=9223372036854775807, file.mtime=1470766383 08-09-2016 20:24:02.879 DEBUG ERP.s3-emr - VirtualIndex - File meets the search criteria. Will consider it, path=s3://my-bucket/data/myapp/2016/08/02/11/myapp_11_40.log ...which indicate to me that the regex isn't doing its job as file.et and file.lt are not set propertly. Does anyone have any idea as to why this might be happening? Thanks in advance!!

mik_cox · ‎08-09-2016

Based on @sjohnson's answer, @sk4l's comments, and some research of my own, I've sort of figured out what's going on in my environment: The setting indexed_realtime_use_by_default in limits.conf was true . In addition, there is another property indexed_realtime_disk_sync_delay which was not set, which means that it was using the default delay of 60 seconds (which is why I saw the 60 second delay I mentioned in the original post). The delay time mentioned above is the amount of delay for indexed real-time searches (to make sure the data gets fully processed and indexed and synced to disk). That means that I could still do non-indexed real-time searches with no delays. However, for an indexed search, there will be the delay. It's been pointed out to me that there's a significant performance hit for performing non-indexed real-time searches though, so there's a trade-off there between the performance hit and the delay. My use case was to see data over the past couple of seconds as it comes in, so doing a non-indexed search could be an acceptable solution in my case.

mik_cox · ‎08-08-2016

As a temporary workaround, I'm no longer using a realtime search and am instead doing a one-off query of the last 5 seconds and re-querying every 5 seconds. This feels "hacky", but it's a band-aid solution until this gets sorted out in the JavaScript SDK.

mik_cox · ‎10-07-2015

Hey all, I've set up a real-time search using NodeJS and the JavaScript SDK as outlined in the example at: https://github.com/splunk/splunk-sdk-javascript/blob/master/examples/node/helloworld/search_realtime.js I've set the earliest_time: 'rt-7s' and latest_time: 'rt' and am polling the job's preview regularly for updates. The problem is that the results I'm getting back are all delayed by about 60 seconds... which seems to defeat the purpose of a real-time search. Am I doing something obviously wrong here, or might there be something else going on with the JavaScript SDK that's causing this delay? (results still come in at the correct rate, but they're always delayed by around a minute) Thanks in advance!

Posts	5
Solutions	2
Karma Given	0
Karma Received	2
Member Since	‎10-07-2015

Online Status	Offline
Date Last Visited	‎06-05-2020 02:04 AM

Hunk is not filtering files based on timestamp

NodeJS JavaScript SDK real-time results are delaye...

Re: Hunk is not filtering files based on timestamp

Hunk is not filtering files based on timestamp

Re: NodeJS JavaScript SDK real-time results are de...

Re: NodeJS JavaScript SDK real-time results are de...

NodeJS JavaScript SDK real-time results are delaye...