All Apps and Add-ons
Highlighted

Splunk Add-on for Microsoft Cloud Services: Storage Table Indexing Delay

Path Finder

Trying to figure out the delay in reading from an Azure Storage table. According to the logs, the event happened at 14:24, 5mins later it wrote to storage table, 75mins later Splunk indexed the event.

Example Event
time: 2016-10-24 14:24:58.395 *(time taken from EventTime)*
indextime: 2016-10-24 15:44:34 (Indexed Time)
Timestamp: 2016-10-24T14:29:35.0328389Z (Time event wrote to storage table)
EventTime: 2016-10-24T14:24:58.395405Z (Time event occured)

inputs.conf
[mscs
storagetable://LinuxsyslogVer2v0]
account = REDACTED
collection
interval = 120
index = os
sourcetype = mscs:storage:linux
start_time = 2016-10-13T00:00:00+00:00

table_list = LinuxsyslogVer2v0

These scheduler.py happen every 120 seconds, until 16:18:44 when the file=mscsstoragetabledatacollector.py runs again.

2016-10-24 16:18:44,214 +0000 loglevel=INFO, pid=60713, tid=Thread-126, file=mscsstoragetabledatacollector.py, funcname=docollectdata, codelineno=79 | [stanzaname="LinuxsyslogVer2v0" accountname="jetprdeast2audit" tablename="LinuxsyslogVer2v0"] The filter is "(Timestamp ge datetime'2016-10-24T14:29:44Z') and (Timestamp lt datetime'2016-10-24T16:17:44Z')".
2016-10-24 14:40:44,179 +0000 loglevel=INFO, pid=60713, tid=Thread-7, file=threadpool.py, funcname=run, codelineno=261 | Thread workqueuesize=0
2016-10-24 14:40:44,178 +0000 loglevel=INFO, pid=60713, tid=Thread-2, file=scheduler.py, funcname=getreadyjobs, codelineno=100 | Get 1 ready jobs, next duration is 119.999464, and there are 1 jobs scheduling
2016-10-24 14:38:44,179 +0000 loglevel=INFO, pid=60713, tid=Thread-6, file=threadpool.py, funcname=run, codelineno=261 | Thread workqueuesize=0
2016-10-24 14:38:44,179 +0000 loglevel=INFO, pid=60713, tid=Thread-2, file=scheduler.py, funcname=getreadyjobs, codelineno=100 | Get 1 ready jobs, next duration is 119.999281, and there are 1 jobs scheduling
2016-10-24 14:36:44,179 +0000 loglevel=INFO, pid=60713, tid=Thread-4, file=threadpool.py, funcname=run, codelineno=261 | Thread workqueuesize=0
2016-10-24 14:36:44,178 +0000 loglevel=INFO, pid=60713, tid=Thread-2, file=scheduler.py, funcname=getreadyjobs, codelineno=100 | Get 1 ready jobs, next duration is 119.999461, and there are 1 jobs scheduling
2016-10-24 14:34:44,179 +0000 loglevel=INFO, pid=60713, tid=Thread-7, file=threadpool.py, funcname=run, codelineno=261 | Thread workqueuesize=0
2016-10-24 14:34:44,178 +0000 loglevel=INFO, pid=60713, tid=Thread-2, file=scheduler.py, funcname=getreadyjobs, codelineno=100 | Get 1 ready jobs, next duration is 119.999312, and there are 1 jobs scheduling
2016-10-24 14:32:44,180 +0000 loglevel=INFO, pid=60713, tid=Thread-6, file=threadpool.py, funcname=run, codelineno=261 | Thread workqueuesize=0
2016-10-24 14:32:44,179 +0000 loglevel=INFO, pid=60713, tid=Thread-2, file=scheduler.py, funcname=getreadyjobs, codelineno=100 | Get 1 ready jobs, next duration is 119.998499, and there are 1 jobs scheduling
2016-10-24 14:30:44,216 +0000 loglevel=INFO, pid=60713, tid=Thread-124, file=mscsstoragetabledatacollector.py, funcname=docollectdata, codelineno=84 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] queryentitiespagesize=1000 eventcntperitem=100 queryendtimeoffset=60.0
2016-10-24 14:30:44,216 +0000 log
level=INFO, pid=60713, tid=Thread-124, file=mscsstoragetabledatacollector.py, funcname=docollectdata, codelineno=79 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] The filter is "(Timestamp ge datetime'2016-10-24T12:41:44Z') and (Timestamp lt datetime'2016-10-24T14:29:44Z')".
2016-10-24 14:30:44,216 +0000 log
level=INFO, pid=60713, tid=Thread-124, file=mscsstorageservicefactory.py, funcname=setproxy, codelineno=68 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] proxy is disabled.
2016-10-24 14:30:44,215 +0000 log
level=INFO, pid=60713, tid=Thread-124, file=mscsstoragetabledatacollector.py, funcname=docollectdata, codelineno=68 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] Finishing pre-process checkpoint.
2016-10-24 14:30:44,215 +0000 log
level=INFO, pid=60713, tid=Thread-124, file=mscsstoragetabledatacollector.py, funcname=preprocessckpt, codelineno=152 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] The entities between 2016-10-24T10:51:44Z and 2016-10-24T12:41:44Z are all collected, updating the checkpoint.
2016-10-24 14:30:44,215 +0000 log
level=INFO, pid=60713, tid=Thread-124, file=mscsstoragetabledatacollector.py, funcname=preprocessckpt, codelineno=145 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] The checkpoint status=alldone.
2016-10-24 14:30:44,215 +0000 loglevel=INFO, pid=60713, tid=Thread-124, file=mscsstoragetabledatacollector.py, funcname=docollectdata, codelineno=65 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] Starting to pre-process checkpoint.
2016-10-24 14:30:44,215 +0000 loglevel=INFO, pid=60713, tid=Thread-124, file=mscsstoragetabledatacollector.py, funcname=collectdata, codelineno=54 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] Starting to collect data.
2016-10-24 14:30:44,214 +0000 loglevel=INFO, pid=60713, tid=Thread-123, file=mscsstoragedispatcher.py, funcname=dispatchtasks, codelineno=122 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] The number of qualifiedstorage is 1
2016-10-24 14:30:44,214 +0000 loglevel=INFO, pid=60713, tid=Thread-123, file=mscsstoragetabledispatcher.py, funcname=getstorageinfolist, codelineno=63 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] The qualified tablelist=[u'LinuxsyslogVer2v0']
2016-10-24 14:30:44,214 +0000 log
level=INFO, pid=60713, tid=Thread-123, file=mscsstoragetabledispatcher.py, funcname=getstorageinfolist, codelineno=57 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] The tablelist=[u'LinuxCpuVer2v0', u'LinuxDiskVer2v0', u'LinuxMemoryVer2v0', u'LinuxsyslogVer2v0', u'MemoryVer2v0', u'SchemasTable', u'WADDiagnosticInfrastructureLogsTable', u'WADMetrics', u'WADMetrics', u'WADMetrics', u'WADMetrics', u'WADPerformanceCountersTable', u'WADWindowsEventLogsTable']
2016-10-24 14:30:44,180 +0000 loglevel=INFO, pid=60713, tid=Thread-123, file=mscsstorageservicefactory.py, funcname=setproxy, codelineno=68 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] proxy is disabled.
2016-10-24 14:30:44,180 +0000 loglevel=INFO, pid=60713, tid=Thread-123, file=mscsstoragetabledispatcher.py, funcname=logcompiledtablenames, codelineno=42 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] The patterns=[u'LinuxsyslogVer2v0$']
2016-10-24 14:30:44,180 +0000 loglevel=INFO, pid=60713, tid=Thread-123, file=mscsstoragedispatcher.py, funcname=dispatchstoragelist, codelineno=85 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] Starting to dispatch storage list.
2016-10-24 14:30:44,180 +0000 loglevel=INFO, pid=60713, tid=Thread-5, file=mscsstoragetablelistdatacollector.py, funcname=collectdata, codelineno=31 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] Starting to get data from datawriter.
2016-10-24 14:30:44,179 +0000 loglevel=INFO, pid=60713, tid=Thread-5, file=mscsstoragetablelistdatacollector.py, funcname=collectdata, codelineno=27 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] Starting to collect data.
2016-10-24 14:30:44,179 +0000 log
level=INFO, pid=60713, tid=Thread-5, file=mscsstoragedispatcher.py, funcname=init, codelineno=40 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] workerthreadsnum=10
2016-10-24 14:30:44,179 +0000 loglevel=INFO, pid=60713, tid=Thread-5, file=tadatacollector.py, funcname=indexdata, codelineno=122 | [stanzaname="LinuxsyslogVer2v0" account="acctname"] Start indexing data for checkpointkey=LinuxsyslogVer2v0acctname
2016-10-24 14:30:44,178 +0000 loglevel=INFO, pid=60713, tid=Thread-2, file=scheduler.py, funcname=getreadyjobs, codelineno=100 | Get 1 ready jobs, next duration is 119.999494, and there are 1 jobs scheduling
2016-10-24 14:29:52,377 +0000 loglevel=INFO, pid=60713, tid=Thread-4, file=threadpool.py, funcname=run, codelineno=261 | Thread workqueuesize=0
2016-10-24 14:29:52,377 +0000 loglevel=INFO, pid=60713, tid=Thread-4, file=tadatacollector.py, funcname=indexdata, codelineno=130 | [stanzaname="LinuxsyslogVer2v0" account="acctname"] End of indexing data for checkpoint_key=LinuxsyslogVer2v0
acctname
2016-10-24 14:29:52,376 +0000 log
level=INFO, pid=60713, tid=Thread-4, file=mscsstoragetablelistdatacollector.py, funcname=collectdata, codelineno=61 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] Retrieve the remain data from datawriter.
2016-10-24 14:29:49,375 +0000 log
level=INFO, pid=60713, tid=Thread-121, file=mscsstoragedispatcher.py, funcname=dispatchstoragelist, codelineno=88 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablelist="LinuxsyslogVer2v0"] Finished dispatching storage list.
2016-10-24 14:29:49,340 +0000 log
level=INFO, pid=60713, tid=Thread-122, file=mscsstoragetabledatacollector.py, funcname=collectdata, codelineno=57 | [stanzaname="LinuxsyslogVer2v0" accountname="acctname" tablename="LinuxsyslogVer2v0"] Finish collecting data.
2016-10-24 14:28:44,179 +0000 log
level=INFO, pid=60713, tid=Thread-7, file=threadpool.py, funcname=run, codelineno=261 | Thread workqueuesize=0
2016-10-24 14:28:44,178 +0000 log
level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, funcname=getreadyjobs, codeline_no=100 | Get 1 ready jobs, next duration is 119.999480, and there are 1 jobs scheduling

0 Karma
Highlighted

Re: Splunk Add-on for Microsoft Cloud Services: Storage Table Indexing Delay

Builder

I have a similar issue with the add-on - however I have it for o365 audit logs. Following this question - Please let us know here if you figure out the reason.

0 Karma
Highlighted

Re: Splunk Add-on for Microsoft Cloud Services: Storage Table Indexing Delay

Path Finder

It might not be the same issue, but I will keep everyone updated. I also have support case opened for my issue as well.

0 Karma
Highlighted

Re: Splunk Add-on for Microsoft Cloud Services: Storage Table Indexing Delay

Path Finder

Apparently this is by design, way to go Splunk!

The updated information for this issue is that the Add-On is currently working as intended. There has been an Enhancement request entered to have this addressed in a future development timeline.

0 Karma