Hi,
Do someone have experience using the Splunk Add-on for Azure app, and retrieving Azure Table storage data?
The problem is getting the table data without having a valid DateTime field in the Azure Table. Instead the table has a column using DateTime.Ticks (a 64-bit integer, e.g. 635999855693046079). The provider of the table is a 3rd party, so I'm not able to change this.
I've tried using the Tick field as the Date/Time Column in the input setup, no error message is shown when pressing save input, but nothing gets indexed. As a test I made a new table with a standard DateTime column, and this was indexed as expected. So far the only solution I see is to mess around in the AzureStorageTable.py script to parse the field correct there, but that's something I'd rather not do.
The generic Azure Storage Table input does not support this. However, the Azure Diagnostics input does use ticks on some of the queries. The PartitionKey for some of the tables is ticks. Using the PartitionKey is preferable when querying Azure data since this field is indexed by Azure (which means faster queries). Check out the code in AzureDiagnostics.py.
Ended up editing AzureStorageTable.py to make it work. I edited/added the following:
import md5, hashlib
.
.
def get_encoded_csv_file_path(checkpoint_dir, file_name):
name = ""
for i in range(len(file_name)):
if file_name[i].isalnum():
name += file_name[i]
else:
name += "_"
name = name[:100]
m = hashlib.md5() # Updated to hashlib since md5.new() is depricated and slower
m.update(file_name)
name += "_" + m.hexdigest() + ".csv.gz"
return os.path.join(checkpoint_dir, name)
# This is used to convert a datetime object to ticks
def dateTimetoTicks(dt):
ticksBeforeEpoch = 621355968000000000
ticksSinceEpochToDateTime = (dt - datetime.datetime.utcfromtimestamp(0)).total_seconds() * 10000000
long(ticksSinceEpochToDateTime)
totalTime = long(ticksBeforeEpoch + ticksSinceEpochToDateTime)
return totalTime
.
.
# Default the start date to 2 days ago
dateTimeStart = (datetime.datetime.today() - datetime.timedelta(days=2))
dateTimeStart = dateTimetoTicks(dateTimeStart)
if not date_time_start in ['',None]:
# if a start time was specified in the config, use that value instead of the default
TempdateTimeStart = dateutil.parser.parse(date_time_start)
dateTimeStart = dateTimetoTicks(TempdateTimeStart)
.
.
if marker is not None:
dateTimeStart = marker
# NEW filter_string with ticks as filter - appending L for specifying Int64
filter_string = "%s gt %sL" % (date_time_column, dateTimeStart)
.
.
# Update to check for if entity is long
if not isinstance(entity[date_time_column], long):
# if the entity column is not a datetime, try to convert it
try:
entity[date_time_column] = dateTimetoTicks(entity[date_time_column])
.
.
if isinstance(entity[date_time_column], long) and (entity[date_time_column] > last_dateTime):
# compare this entitiy's datetime to the last_dateTime variable
last_dateTime = entity[date_time_column]
Querying PartitionKey (ticks) is much faster than querying datetime-objects as explained here
http://www.codeproject.com/Tips/671361/Some-tips-and-tricks-for-azure-table-storage
Cheers!
The generic Azure Storage Table input does not support this. However, the Azure Diagnostics input does use ticks on some of the queries. The PartitionKey for some of the tables is ticks. Using the PartitionKey is preferable when querying Azure data since this field is indexed by Azure (which means faster queries). Check out the code in AzureDiagnostics.py.
The best thing is to convert Azure ticks
to time_t
before you import the data. If that cannot be done, then the next best thing is to use DATETIME=CURRENT
to force it to use now
as each event's timestamp and then use All time
for every search, convert the Ticks
field to a time_t
field and reassign this value to _time
before doing any work. The performance on this will be terrible but it will work. Here is how to convert:
http://stackoverflow.com/questions/1613853/convert-ticks-to-time-t-or-filetime