All Apps and Add-ons

Splunk Analytics for Hadoop: Why does streaming search over a short timerange work but it doesn't complete over longer timerange?

suarezry
Builder

We are running Splunk v6.5.1 with HortonWorks HDP v2.5. I can run a streaming search (not mapreduce) on a short timerange and it completes fine. Over a longer timerange (eg. Year to date). It does not complete. If I send the job to the background with notify by email when complete it sends this report:

The search has generated the following
messages:

ERROR MESSAGES:

[myProvider] Error while running external process, return_code=255. See search.log for more info
[myProvider] Exception - java.io.IOException: Error while waiting for MapReduce job to complete, job_id=[!http://mynn2.internal:8088/proxy/application_1484675915702_1256/job_1484675915702_1256], state=FAILED, reason=Job commit failed: java.io.IOException: Failed to rename FileStatus{path=hdfs://mynn1.internal:8020/user/splunk/splunk-srch/dispatch/1485179844.151_27F72EA7-317D-46DE-934A-9C1C824073F5/2/_temporary/1/task_1484675915702_1256_m_001792; isDirectory=true; modification_time=1485181150047; access_time=0; owner=root; group=hdfs; permission=rwxr-xr-x; isSymlink=false} to hdfs://mynn1.internal:8020/user/splunk/splunk-srch/dispatch/1485179844.151_27F72EA7-317D-46DE-934A-9C1C824073F5/2
I see this error in the hadoop-hdfs-namenode-mynn1.log

2017-01-23 09:21:25,704 WARN hdfs.StateChange (FSDirRenameOp.java:validateRenameSource(541)) - DIR* FSDirectory.unprotectedRenameTo: rename source /user/splunk/splunk-srch/dispatch/1485179844.151_27F72EA7-317D-46DE-934A-9C1C824073F5/2/_temporary/1/task_1484675915702_1256_m_001792 is not found.
Can someone help troubleshoot?

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Have you tried this flag:
vix.splunk.search.mixedmode.maxstream  - max # of bytes to stream during mixed mode (default 10GB). Value = 0, means there's no stream limit. Will stop streaming after the first split that took the value over the limit.

rdagan_splunk
Splunk Employee
Splunk Employee

In that case I suspect that you have two options:
first option) Create a Provider and fix the MapReduce flags (Yarn Servers, etc ..)
second option) Create two Provider, one for Streaming only using the above streaming mode, and one for Mixmode (streaming first and reporting second = default )

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

If the goal is not to run MapReduce jobs you will need to change these flags:
vix.mode = stream
vix.splunk.search.mixedmode = 0
There are two possible vix.mode values: stream & report. Stream does not push computation to external systems, report does.

0 Karma

suarezry
Builder

This particular search is not, but there are other searches that require MapR.

0 Karma

suarezry
Builder

Thank you that seemed to help. The response is much quicker now. But I am still getting the file not found problem:

[sheridan] Error while running external process, return_code=255. See search.log for more info
[sheridan] Exception - com.splunk.mr.JobStartException: Failed to start MapReduce job. Please consult search.log for more information. Message: File /user/splunk/splunk-srch/dispatch/1485202657.66_27F72EA7-317D-46DE-934A-9C1C824073F5/0 does not exist.

This is my search.log:
http://pastebin.ca/3759606

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...