Getting Data In

How archiving splunk logs to hdfs work

gaddyh
New Member

I know that Hunk issues the archivebuckets command, which will start the archiving process on each indexer.
What is the archiving process?

Do all indexer machines need access to hadoop (hdfs)?
Does hunk copy the files locally and than to hdfs?

please explain exactly what happens in this process.

Tags (2)
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Do all indexer machines need access to hadoop (hdfs)?
Yes, they need network access, Java and Hadoop libraries (ie you should be able to successfully run hadoop fs -copyFromLocal .... from every indexer)

Does hunk copy the files locally and than to hdfs?
No, the buckets are copied from indexers directly to HDFS

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Archiving indexers from Splunk to HDFS require the bucket to be in the Warm or Cold stage. Therefore, we can either wait a day, or restart Hunk.
The Splunk_Archiver (New App with Hunk 6.2.1) using the Bundle Replication is distributed to all the Indexers.   
Every 60 Minutes, the App on each Indexer executes the search = | archivebuckets This command will Triggers bucket copying From the Indexer to HDFS (Similar to Hadoop fs –put BUCKET /HDFS)

0 Karma

acharlieh
Influencer

The announcement blog indicates all indexers need Java and Hadoop clients so I would guess that they're talking directly to HDFS.

0 Karma

gaddyh
New Member

Which ports does Hunk use to connect to the splunk indexers?

What Splunk version need to be installed on the indexer machines to support archiving indexes?
What else needs to be installed on the Splunk indexer machines?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

To connect to indexers, the same Splunk Enterprise ports are needed (8089 by default). For Hadoop, ports depend on the Hadoop distro - indexers need access to the Secondary/Namenode (usually port 8020), and DataNode transfer ports (usually port 50010). Indexer Splunk version doesn't matter, but you need Java and Hadoop libraries on the indexers.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...