All Apps and Add-ons

Why is the indextest script hanging and never completes for the SplunkIt app?

anandhim
Path Finder

I'm trying to use splunkit to benchmark a new hardware configuration we are trying for the indexers. I generates 75GB of data using the gendata script in the app but indextest script never completes. It's percent complete values decrease and then get hung at a certain value every time

[2015-01-06 13:32:54,707] IdxTestSetup: Indexing progress: 43.07%
[2015-01-06 13:33:24,958] IdxTestSetup: Indexing progress: 44.39%
[2015-01-06 13:33:55,230] IdxTestSetup: Indexing progress: 47.01%
[2015-01-06 13:34:25,566] IdxTestSetup: Indexing progress: 48.30%
[2015-01-06 13:34:55,818] IdxTestSetup: Indexing progress: 49.59%
[2015-01-06 13:35:26,072] IdxTestSetup: Indexing progress: 50.93%
[2015-01-06 13:35:56,327] IdxTestSetup: Indexing progress: 52.28%
[2015-01-06 13:36:26,583] IdxTestSetup: Indexing progress: 36.64%
[2015-01-06 13:36:56,844] IdxTestSetup: Indexing progress: 39.29%
[2015-01-06 13:37:27,077] IdxTestSetup: Indexing progress: 39.49%
[2015-01-06 13:37:57,311] IdxTestSetup: Indexing progress: 39.49%
[2015-01-06 13:38:27,544] IdxTestSetup: Indexing progress: 39.49%
[2015-01-06 13:38:57,777] IdxTestSetup: Indexing progress: 39.49%
[2015-01-06 13:39:28,011] IdxTestSetup: Indexing progress: 39.49%
[2015-01-06 13:39:58,242] IdxTestSetup: Indexing progress: 39.49%

I have tried multiple times to delete the index and source as well and start with a clean splunk instance but it has not helped. The instance reports to our license master and has 100GB quota allocated to it. I have tried with other sizes of data as well but same result.

I'm using version 6.1.3 for Splunk and version 3.2 of the splunkit app.

Any pointers to what else I can check?

Tags (2)
0 Karma
1 Solution

jlin
Splunk Employee
Splunk Employee

The script is basically running a Splunk search command to check if what we have in Splunk is matching expected count (number of lines in the generated log file).

It seems weird that the percentage does down though. It seems like events are getting deleted somehow? Since the test script only check the expected event count once when it got started.

A couple of things you could check:

  • Before you start another run of indexing test, make sure that previous events are cleaned up within Splunk. Such as the test index that is used (splunkit_idxtest) and the monitored directory (/data/static)

  • If you got into the state where it is hung again, check your Splunk instance to see event counts currently indexed (index=splunkit_idxtest)

View solution in original post

schoi
Splunk Employee
Splunk Employee

Basically, the indexing test will:
1. generate a static file and put it under splunkit-server/data/static/syslog_#g.log
2. put the count of the lines (events) in splunkit_server/bin/datagenRecord.log
3. monitor splunkit-server/data/static/
4. figure out that indexing is complete when the # of events in the splunkit_idxtest index matches the value of #2.

So things can go wrong if:
- you have python gendata.py script(s) still running (ie: data is still being written to splunkit-server/data/static/*)
- you have a wrong # in splunkit_server/bin/datagenRecord.log

Check to make sure that (1) no extraneous gendata scripts are running, (2) splunkit-server/data/static/ only contains the generated data file you want, (3) splunkit_server/bin/datagenRecord.log has the # of lines of the single file in splunkit-server/data/static/

You can also check what is in these logs:
- splunkit_server/bin/datagenRecord.log
- splunkit_server/bin/indexRecord.log
- splunkit_server/log/indextest.log

anandhim
Path Finder

I did confirm that gendata was not running when I initiated indextest and also confirm the line numbers are the same :

$ cat datagenRecord.log
259130435
$ wc -l ../data/static/syslog_75g.log
259130435 ../data/static/syslog_75g.log
$ ls -ltr ../data/static/
total 78643208
-rw-rw-r-- 1 anandhim cc7547 80530636489 Jan 5 11:54 syslog_75g.log

I'll try to do one last clean install of splunk and data generation again today.

0 Karma

schoi
Splunk Employee
Splunk Employee

Actually, based on @jlin's comment "It seems like events are getting deleted somehow" ... the datagen by default starts the log file with data from 1 year ago.

Can you check your default/system value for "frozenTimePeriodInSec"?

if it's changed from the default out of the box value, to something lower than 1 year, your data will be frozen as you index, which could account for the decrementing % indexed value.

splunkd.log should tell you if data is getting frozen

anandhim
Path Finder

Bang on!
Our deployment template was adding a time retention of 14 days by default in etc/system/local/indexes.con that I overlooked.

The indextest script is completing now.

Thanks guys.

jlin
Splunk Employee
Splunk Employee

The script is basically running a Splunk search command to check if what we have in Splunk is matching expected count (number of lines in the generated log file).

It seems weird that the percentage does down though. It seems like events are getting deleted somehow? Since the test script only check the expected event count once when it got started.

A couple of things you could check:

  • Before you start another run of indexing test, make sure that previous events are cleaned up within Splunk. Such as the test index that is used (splunkit_idxtest) and the monitored directory (/data/static)

  • If you got into the state where it is hung again, check your Splunk instance to see event counts currently indexed (index=splunkit_idxtest)