Hunk is taking too much time for processing hive O...

toabhishek16 · ‎06-25-2015

Hi Team,

I have set up hunk with Apache Hadoop 2.26 and my data is stored in Hive 0.13 table with ORC compression. Data size is around 2 TB.

When I am trying to execute any query through Hunk, it is taking too much time. Equivalent query in hive is taking only 80 Sec. I am executing the above on the same hive table.

Please help me to improve the performance of Hunk. How I can achieve fast data processing through Hunk?

Thanks
Abhishek

rdagan_splunk · ‎06-26-2015

Try this combo:
index=idxtmgorc cs_username="anyname" | stats count(cs_username) as username
Be in Smart Mode
In addition, Here is a link that shows you the Mode options:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Aboutsearchactionsandmodes
and a link that shows you the Search commands:
http://docs.splunk.com/Documentation/Splunk/6.2.4/SearchTutorial/Usethesearchlanguage

rdagan_splunk · ‎06-29-2015

I am happy to see that you are able to improve the performance when running Hadoop MapReduce Jobs.
The rule is Hunk triggers an MR job if:
1. the search is not ran in verbose mode AND
2. the search contains any filtering predicates in the first search command
OR
3. the search contains any reporting commands

toabhishek16 · ‎07-01-2015

thanks rdagan_splunk for clarifying the problem.

toabhishek16 · ‎06-29-2015

Hi rdagan_splunk,

I am in Smart mode as suggested by you.

I tried the queries suggested by you and observed when I am runnning query which involves stats command, it is working fast. but when I am trying to run generic queries like index=idxtmg, its is taking too much time for giving results.

please suggest me how I can improve the system's performance.

rdagan_splunk · ‎06-25-2015

Can you share the Hunk query? Also please make sure you are in a smart search mode (not in Verbose mode)

toabhishek16 · ‎06-26-2015

Hi rdagan_splunk,

I am using below queries in Hunk and Hive respectevely:

index=idxtmgorc cs_username="anyname" - taking too much time almost more then 30 minutes.

select cs_username from tmg_orc_table where cs_username='anyname - hive query taking only 80 sec on same data and same cluster.

give me some time, I will let you know about the mode.
Thanks & Regards
Abhishek Soni

burwell · ‎05-01-2016

Abhishek did you ever solve the performance issue? I have the same issue.

Hunk is taking too much time for processing hive ORC data. How do I improve the performance?

Message Parsing in SOCK

Exploring the OpenTelemetry Collector’s Kubernetes annotation-based discovery

Use ‘em or lose ‘em | Splunk training units do expire