Getting Data In

How to prevent large lookups from being replicated to Yarn / Hadoop from Hunk?

tsunamii
Path Finder

We have a user who has created a large csv lookup file (600 Mb). It seems that this file is being replicated to every Yarn server to it's /tmp directory (/tmp/splunk/nodename/splunk/var/run/searchpeers) apparently as every search is executed. There are multiple copies of this bundle. This is filling the /tmp directory and causing a major problem, it also slows every search as this file has to be copied before the search begins to execute.

We have specified the following in the hunk servers distsearch.conf to no effect:
[replicationBlacklist]
Everything = Servers.csv

How do we block this file being copied with every search?
Can the file be moved into an HDFS directory instead?
Can it be cached so that it doesn't need to be replicated with every search?

Tags (2)
0 Karma

woodcock
Esteemed Legend

Assuming that this is being used as a lookup file, you can specify that the lookup happens only on the Search Head by adding the local=t parameter as in ... | lookup local=t myLookup .... This will prevent it from being included in the bundle (replication). The downside is that if the output fields of the lookup are used to qualify the search at any point, you will lose the benefits of having this work being map-reduced and happening on the Indexers; instead it will all happen on the Search Head.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

To make sure the Table is not being copied every time:
vix.splunk.setup.onsearch = 0 (default is 1)
** However, that means nothing will be copied to the data nodes. So you may want to turn it off only after the first run.

To make sure you have lots of copies of the table so that bundle replication happens fast:
vix.splunk.setup.bundle.replication = 20 (default 3)

0 Karma
Get Updates on the Splunk Community!

Synthetic Monitoring: Not your Grandma’s Polyester! Tech Talk: DevOps Edition

Register today and join TekStream on Tuesday, February 28 at 11am PT/2pm ET for a demonstration of Splunk ...

Instrumenting Java Websocket Messaging

Instrumenting Java Websocket MessagingThis article is a code-based discussion of passing OpenTelemetry trace ...

Announcing General Availability of Splunk Incident Intelligence!

Digital transformation is real! Across industries, companies big and small are going through rapid digital ...