I was trying out datamodel acceleration with Hunk (latest version). This is how my datamodel.conf looks:
cat etc/apps/search/local/datamodels.conf
[LVSMC]
acceleration = 1
acceleration.earliest_time = -1d
acceleration.hunk.compression_codec = snappy
acceleration.hunk.dfs_block_size = 134217728
acceleration.hunk.file_format = parquet
acceleration.manual_rebuilds = 0
It starts accelerating. But, the parquet-snappy files get deleted after collecting for around 10-20 mins. Suddenly, the parquet files disappears. May be summary creation is dropping this newly created files.
$ date;/usr/bin/hadoop fs -du -h /abcd/SplunkMR/datamodel
Wed Jul 6 08:46:56 PDT 2016
0 0 /abcd/SplunkMR/datamodel/70F888CB-CA73-4A97-B54F-6B0ACA9A4E7E_DM_search_test
$ date;/usr/bin/hadoop fs -du -h /abcd/SplunkMR/datamodel
Wed Jul 6 09:04:40 PDT 2016
2.5 G 7.4 G /abcd/SplunkMR/datamodel/70F888CB-CA73-4A97-B54F-6B0ACA9A4E7E_DM_search_test
$ date;/usr/bin/hadoop fs -du -h /abcd/SplunkMR/datamodel
Wed Jul 6 09:05:47 PDT 2016
75.4 M 226.2 M /abcd/SplunkMR/datamodel/70F888CB-CA73-4A97-B54F-6B0ACA9A4E7E_DM_search_test
I tried to play around with other options. It did not help.
acceleration.max_time
acceleration.backfill_time
acceleration.manual_rebuilds
acceleration.max_concurrent
Pls note that our Hunk would require around 8 hours to process entire day’s data when no other queries are fired. I don’t know how to catch up and make Hunk accelerate datamodel for 1 day data. Is there some switch that I can use to retain the parquet-snappy files? I tried to adjust earliest_time and backfill_time(much shorter than earliest_time). It did not help.
Pls let me know where it could be going wrong.
... View more