Trying to figure out the delay in reading from an Azure Storage table. According to the logs, the event happened at 14:24, 5mins later it wrote to storage table, 75mins later Splunk indexed the event.
Example Event
_time: 2016-10-24 14:24:58.395 (_time taken from EventTime)
_indextime: 2016-10-24 15:44:34 (Indexed Time)
Timestamp: 2016-10-24T14:29:35.0328389Z (Time event wrote to storage table)
EventTime: 2016-10-24T14:24:58.395405Z (Time event occured)
inputs.conf
[mscs_storage_table://LinuxsyslogVer2v0]
account = REDACTED
collection_interval = 120
index = os
sourcetype = mscs:storage:linux
start_time = 2016-10-13T00:00:00+00:00
These scheduler.py happen every 120 seconds, until 16:18:44 when the file=mscs_storage_table_data_collector.py runs again.
2016-10-24 16:18:44,214 +0000 log_level=INFO, pid=60713, tid=Thread-126, file=mscs_storage_table_data_collector.py, func_name=do_collect_data, code_line_no=79 | [stanza_name="LinuxsyslogVer2v0" account_name="jetprdeast2audit" table_name="LinuxsyslogVer2v0"] The filter is "(Timestamp ge datetime'2016-10-24T14:29:44Z') and (Timestamp lt datetime'2016-10-24T16:17:44Z')".
2016-10-24 14:40:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-7, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:40:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999464, and there are 1 jobs scheduling
2016-10-24 14:38:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-6, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:38:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999281, and there are 1 jobs scheduling
2016-10-24 14:36:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-4, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:36:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999461, and there are 1 jobs scheduling
2016-10-24 14:34:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-7, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:34:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999312, and there are 1 jobs scheduling
2016-10-24 14:32:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-6, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:32:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.998499, and there are 1 jobs scheduling
2016-10-24 14:30:44,216 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_do_collect_data, code_line_no=84 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] query_entities_page_size=1000 event_cnt_per_item=100 query_end_time_offset=60.0
2016-10-24 14:30:44,216 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_do_collect_data, code_line_no=79 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] The filter is "(Timestamp ge datetime'2016-10-24T12:41:44Z') and (Timestamp lt datetime'2016-10-24T14:29:44Z')".
2016-10-24 14:30:44,216 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_service_factory.py, func_name=_set_proxy, code_line_no=68 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] proxy is disabled.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_do_collect_data, code_line_no=68 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] Finishing pre-process checkpoint.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_pre_process_ckpt, code_line_no=152 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] The entities between 2016-10-24T10:51:44Z and 2016-10-24T12:41:44Z are all collected, updating the checkpoint.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_pre_process_ckpt, code_line_no=145 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] The checkpoint status=all_done.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_do_collect_data, code_line_no=65 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] Starting to pre-process checkpoint.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=collect_data, code_line_no=54 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] Starting to collect data.
2016-10-24 14:30:44,214 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_dispatcher.py, func_name=_dispatch_tasks, code_line_no=122 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] The number of qualified_storage is 1
2016-10-24 14:30:44,214 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_table_dispatcher.py, func_name=_get_storage_info_list, code_line_no=63 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] The qualified table_list=[u'LinuxsyslogVer2v0']
2016-10-24 14:30:44,214 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_table_dispatcher.py, func_name=_get_storage_info_list, code_line_no=57 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] The table_list=[u'LinuxCpuVer2v0', u'LinuxDiskVer2v0', u'LinuxMemoryVer2v0', u'LinuxsyslogVer2v0', u'MemoryVer2v0', u'SchemasTable', u'WADDiagnosticInfrastructureLogsTable', u'WADMetrics', u'WADMetrics', u'WADMetrics', u'WADMetrics', u'WADPerformanceCountersTable', u'WADWindowsEventLogsTable']
2016-10-24 14:30:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_service_factory.py, func_name=_set_proxy, code_line_no=68 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] proxy is disabled.
2016-10-24 14:30:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_table_dispatcher.py, func_name=_log_compiled_table_names, code_line_no=42 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] The patterns=[u'LinuxsyslogVer2v0$']
2016-10-24 14:30:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_dispatcher.py, func_name=_dispatch_storage_list, code_line_no=85 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Starting to dispatch storage list.
2016-10-24 14:30:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-5, file=mscs_storage_table_list_data_collector.py, func_name=collect_data, code_line_no=31 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Starting to get data from data_writer.
2016-10-24 14:30:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-5, file=mscs_storage_table_list_data_collector.py, func_name=collect_data, code_line_no=27 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Starting to collect data.
2016-10-24 14:30:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-5, file=mscs_storage_dispatcher.py, func_name=init, code_line_no=40 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] worker_threads_num=10
2016-10-24 14:30:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-5, file=ta_data_collector.py, func_name=index_data, code_line_no=122 | [stanza_name="LinuxsyslogVer2v0" account="acctname"] Start indexing data for checkpoint_key=LinuxsyslogVer2v0acctname
2016-10-24 14:30:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999494, and there are 1 jobs scheduling
2016-10-24 14:29:52,377 +0000 log_level=INFO, pid=60713, tid=Thread-4, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:29:52,377 +0000 log_level=INFO, pid=60713, tid=Thread-4, file=ta_data_collector.py, func_name=index_data, code_line_no=130 | [stanza_name="LinuxsyslogVer2v0" account="acctname"] End of indexing data for checkpoint_key=LinuxsyslogVer2v0_acctname
2016-10-24 14:29:52,376 +0000 log_level=INFO, pid=60713, tid=Thread-4, file=mscs_storage_table_list_data_collector.py, func_name=collect_data, code_line_no=61 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Retrieve the remain data from data_writer.
2016-10-24 14:29:49,375 +0000 log_level=INFO, pid=60713, tid=Thread-121, file=mscs_storage_dispatcher.py, func_name=_dispatch_storage_list, code_line_no=88 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Finished dispatching storage list.
2016-10-24 14:29:49,340 +0000 log_level=INFO, pid=60713, tid=Thread-122, file=mscs_storage_table_data_collector.py, func_name=collect_data, code_line_no=57 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] Finish collecting data.
2016-10-24 14:28:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-7, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:28:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999480, and there are 1 jobs scheduling
Apparently this is by design, way to go Splunk!
The updated information for this issue is that the Add-On is currently working as intended. There has been an Enhancement request entered to have this addressed in a future development timeline.
I have a similar issue with the add-on - however I have it for o365 audit logs. Following this question - Please let us know here if you figure out the reason.
It might not be the same issue, but I will keep everyone updated. I also have support case opened for my issue as well.