Monitoring Splunk

Does Hunk create an index?

Path Finder

I have created a virtual index with CDH5 and Hunk 6.1. A simple query like the following:

index=tomnetflow destination_address="71.214.56.38"

runs about 28 minutes on our small, 5 node lab cluster with 32GB of memory and 3 HDFS nodes. There is some 35 million netflow records. My question is twofold:

1) When i run the same query over and over, the performance is very linear. Would I expect an index to be created somewhere so subsequent queries run faster?

2) In terms of improving Splunk/Hunk/Hadoop performance, if I segregate the data into directories in HDFS based on date for example (2014-05-26, 2014-05-27) will performance increase (provided i narrow my search to last 24 hours for example)?

Thank You.

0 Karma

Splunk Employee
Splunk Employee

If you want to run the same query over and over it is better to make it a saved search and accelerate that in Hunk 6.1

0 Karma

Splunk Employee
Splunk Employee

Just to clarify, Hunk does not create an index based on the data that it searches once. Yes, we do recommend that you partition your data based on time and any other fields that you search frequently.

0 Karma

Splunk Employee
Splunk Employee

1) In order for you to create a MR job, you will need to change your Splunk query:
From this - index=tomnetflow destinationaddress="71.214.56.38"
To something like this - index=tomnetflow destination
address="71.214.56.38" | top destination_address

In addition, make sure that you are in ' smart mode ' and not in ' verbose mode '

2) Hunk uses VIX = Virtual Index. Therefore, the index itself is not created and performance will not be any faster.

3) To make sure Hunk runs faster - Make sure you run MR Jobs (see answer to #1), Make sure you use VIX with REGEX that will extract the time from the file name or the HDFS directory name (as you mentioned - that will allow Hunk to bring less data per MR job), If you use Report Acceleration that will Cache the results.

0 Karma