Monitoring Splunk

Hunk is taking too much time for processing hive ORC data. How do I improve the performance?

toabhishek16
New Member

Hi Team,

I have set up hunk with Apache Hadoop 2.26 and my data is stored in Hive 0.13 table with ORC compression. Data size is around 2 TB.

When I am trying to execute any query through Hunk, it is taking too much time. Equivalent query in hive is taking only 80 Sec. I am executing the above on the same hive table.

Please help me to improve the performance of Hunk. How I can achieve fast data processing through Hunk?

Thanks
Abhishek

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Try this combo:
index=idxtmgorc cs_username="anyname" | stats count(cs_username) as username
Be in Smart Mode
In addition, Here is a link that shows you the Mode options:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Aboutsearchactionsandmodes
and a link that shows you the Search commands:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Usethesearchlanguage

rdagan_splunk
Splunk Employee
Splunk Employee

I am happy to see that you are able to improve the performance when running Hadoop MapReduce Jobs.
The rule is Hunk triggers an MR job if:
1. the search is not ran in verbose mode AND
2. the search contains any filtering predicates in the first search command
OR
3. the search contains any reporting commands

0 Karma

toabhishek16
New Member

thanks rdagan_splunk for clarifying the problem.

0 Karma

toabhishek16
New Member

Hi rdagan_splunk,

I am in Smart mode as suggested by you.

I tried the queries suggested by you and observed when I am runnning query which involves stats command, it is working fast. but when I am trying to run generic queries like index=idxtmg, its is taking too much time for giving results.

please suggest me how I can improve the system's performance.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Can you share the Hunk query? Also please make sure you are in a smart search mode (not in Verbose mode)

0 Karma

toabhishek16
New Member

Hi rdagan_splunk,

I am using below queries in Hunk and Hive respectevely:

index=idxtmgorc cs_username="anyname" - taking too much time almost more then 30 minutes.

select cs_username from tmg_orc_table where cs_username='anyname - hive query taking only 80 sec on same data and same cluster.

give me some time, I will let you know about the mode.
Thanks & Regards
Abhishek Soni

0 Karma

burwell
SplunkTrust
SplunkTrust

Abhishek did you ever solve the performance issue? I have the same issue.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...