Monitoring Splunk

Hunk is taking too much time for processing hive ORC data. How do I improve the performance?

toabhishek16
New Member

Hi Team,

I have set up hunk with Apache Hadoop 2.26 and my data is stored in Hive 0.13 table with ORC compression. Data size is around 2 TB.

When I am trying to execute any query through Hunk, it is taking too much time. Equivalent query in hive is taking only 80 Sec. I am executing the above on the same hive table.

Please help me to improve the performance of Hunk. How I can achieve fast data processing through Hunk?

Thanks
Abhishek

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Try this combo:
index=idxtmgorc cs_username="anyname" | stats count(cs_username) as username
Be in Smart Mode
In addition, Here is a link that shows you the Mode options:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Aboutsearchactionsandmodes
and a link that shows you the Search commands:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Usethesearchlanguage

rdagan_splunk
Splunk Employee
Splunk Employee

I am happy to see that you are able to improve the performance when running Hadoop MapReduce Jobs.
The rule is Hunk triggers an MR job if:
1. the search is not ran in verbose mode AND
2. the search contains any filtering predicates in the first search command
OR
3. the search contains any reporting commands

0 Karma

toabhishek16
New Member

thanks rdagan_splunk for clarifying the problem.

0 Karma

toabhishek16
New Member

Hi rdagan_splunk,

I am in Smart mode as suggested by you.

I tried the queries suggested by you and observed when I am runnning query which involves stats command, it is working fast. but when I am trying to run generic queries like index=idxtmg, its is taking too much time for giving results.

please suggest me how I can improve the system's performance.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Can you share the Hunk query? Also please make sure you are in a smart search mode (not in Verbose mode)

0 Karma

toabhishek16
New Member

Hi rdagan_splunk,

I am using below queries in Hunk and Hive respectevely:

index=idxtmgorc cs_username="anyname" - taking too much time almost more then 30 minutes.

select cs_username from tmg_orc_table where cs_username='anyname - hive query taking only 80 sec on same data and same cluster.

give me some time, I will let you know about the mode.
Thanks & Regards
Abhishek Soni

0 Karma

burwell
SplunkTrust
SplunkTrust

Abhishek did you ever solve the performance issue? I have the same issue.

0 Karma
Get Updates on the Splunk Community!

Message Parsing in SOCK

Introduction This blog post is part of an ongoing series on SOCK enablement. In this blog post, I will write ...

Exploring the OpenTelemetry Collector’s Kubernetes annotation-based discovery

We’ve already explored a few topics around observability in a Kubernetes environment -- Common Failures in a ...

Use ‘em or lose ‘em | Splunk training units do expire

Whether it’s hummus, a ham sandwich, or a human, almost everything in this world has an expiration date. And, ...