Monitoring Splunk

Hunk is taking too much time for processing hive ORC data. How do I improve the performance?

toabhishek16
New Member

Hi Team,

I have set up hunk with Apache Hadoop 2.26 and my data is stored in Hive 0.13 table with ORC compression. Data size is around 2 TB.

When I am trying to execute any query through Hunk, it is taking too much time. Equivalent query in hive is taking only 80 Sec. I am executing the above on the same hive table.

Please help me to improve the performance of Hunk. How I can achieve fast data processing through Hunk?

Thanks
Abhishek

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Try this combo:
index=idxtmgorc cs_username="anyname" | stats count(cs_username) as username
Be in Smart Mode
In addition, Here is a link that shows you the Mode options:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Aboutsearchactionsandmodes
and a link that shows you the Search commands:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Usethesearchlanguage

rdagan_splunk
Splunk Employee
Splunk Employee

I am happy to see that you are able to improve the performance when running Hadoop MapReduce Jobs.
The rule is Hunk triggers an MR job if:
1. the search is not ran in verbose mode AND
2. the search contains any filtering predicates in the first search command
OR
3. the search contains any reporting commands

0 Karma

toabhishek16
New Member

thanks rdagan_splunk for clarifying the problem.

0 Karma

toabhishek16
New Member

Hi rdagan_splunk,

I am in Smart mode as suggested by you.

I tried the queries suggested by you and observed when I am runnning query which involves stats command, it is working fast. but when I am trying to run generic queries like index=idxtmg, its is taking too much time for giving results.

please suggest me how I can improve the system's performance.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Can you share the Hunk query? Also please make sure you are in a smart search mode (not in Verbose mode)

0 Karma

toabhishek16
New Member

Hi rdagan_splunk,

I am using below queries in Hunk and Hive respectevely:

index=idxtmgorc cs_username="anyname" - taking too much time almost more then 30 minutes.

select cs_username from tmg_orc_table where cs_username='anyname - hive query taking only 80 sec on same data and same cluster.

give me some time, I will let you know about the mode.
Thanks & Regards
Abhishek Soni

0 Karma

burwell
SplunkTrust
SplunkTrust

Abhishek did you ever solve the performance issue? I have the same issue.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!