Monitoring Splunk

Hunk is taking too much time for processing hive ORC data. How do I improve the performance?

toabhishek16
New Member

Hi Team,

I have set up hunk with Apache Hadoop 2.26 and my data is stored in Hive 0.13 table with ORC compression. Data size is around 2 TB.

When I am trying to execute any query through Hunk, it is taking too much time. Equivalent query in hive is taking only 80 Sec. I am executing the above on the same hive table.

Please help me to improve the performance of Hunk. How I can achieve fast data processing through Hunk?

Thanks
Abhishek

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Try this combo:
index=idxtmgorc cs_username="anyname" | stats count(cs_username) as username
Be in Smart Mode
In addition, Here is a link that shows you the Mode options:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Aboutsearchactionsandmodes
and a link that shows you the Search commands:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Usethesearchlanguage

rdagan_splunk
Splunk Employee
Splunk Employee

I am happy to see that you are able to improve the performance when running Hadoop MapReduce Jobs.
The rule is Hunk triggers an MR job if:
1. the search is not ran in verbose mode AND
2. the search contains any filtering predicates in the first search command
OR
3. the search contains any reporting commands

0 Karma

toabhishek16
New Member

thanks rdagan_splunk for clarifying the problem.

0 Karma

toabhishek16
New Member

Hi rdagan_splunk,

I am in Smart mode as suggested by you.

I tried the queries suggested by you and observed when I am runnning query which involves stats command, it is working fast. but when I am trying to run generic queries like index=idxtmg, its is taking too much time for giving results.

please suggest me how I can improve the system's performance.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Can you share the Hunk query? Also please make sure you are in a smart search mode (not in Verbose mode)

0 Karma

toabhishek16
New Member

Hi rdagan_splunk,

I am using below queries in Hunk and Hive respectevely:

index=idxtmgorc cs_username="anyname" - taking too much time almost more then 30 minutes.

select cs_username from tmg_orc_table where cs_username='anyname - hive query taking only 80 sec on same data and same cluster.

give me some time, I will let you know about the mode.
Thanks & Regards
Abhishek Soni

0 Karma

burwell
SplunkTrust
SplunkTrust

Abhishek did you ever solve the performance issue? I have the same issue.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...