All Apps and Add-ons

Can you help with my export data issue using Splunk Hadoop connect?

Anmar0293
Path Finder

When i export data through building an export and schedule, what is the point of the level to compress the file? There is a scale from 0-9

Also, what are the "Partition by: Date Hour Host Sourcetype Source" for?

Is schedule export from index the only way to export data? What are the differences between the two if there is another way? I'm trying to grab all the data and push it to Hadoop.

What are the commands supposed to be?

Thanks a lot!

0 Karma
1 Solution

sduff_splunk
Splunk Employee
Splunk Employee

All your questions are covered in the docs, http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/ExporttoHDFS

Compression Level
Use the slider bar to determine file compression. 0 means no compression and 9 give you the maximum possible compression. The higher the compression level is set, the more slowly files are written. The higher the compression, the smaller the size of the exported files in hdfs will be. Also note that higher compression may also mean slower retrieval of data

How does partitioning work?

Partitioning is a process by which the export data is placed in a dynamic directory structure based on the values of certain event fields. You can choose how exported data is partitioned. It can be partitioned by any of the fields present in the event.

Hadoop Connect exposes the following out-of-the-box partitioning variables:

  • Date
  • Hour
  • Host
  • Source
  • Sourcetype

When you are creating an export job, you can select one or more of these partition variables in the user interface.

Splunk will create directories based on these partions in hdfs. So if you select Date, Hour and Sourcetype, Splunk will create the following directory structure for your data /2018/11/06/00/WinEventLog_Security/ to store the various files containing your exported data

Creating your own partitions

In addition to the out-of-the-box partitioning variables, you can use any field to compute a partitioning path into the events (results) to be exported. Use the special field _dstpath. For example, to export your search results into the path <base-path>/<date>/<hour>/<app>, use the following search string:

search ….. | eval _dstpath=strftime(_time, “%Y%m%d/%H”) + “/” + app_name

You can perform other types of preprocessing of data in the Splunk platform (lookups, field extractions, evaluate other fields, and so on) or choose to export it in raw format

Schedule Export
The Schedule Export is the only way to export indexed data into hdfs.

View solution in original post

sduff_splunk
Splunk Employee
Splunk Employee

All your questions are covered in the docs, http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/ExporttoHDFS

Compression Level
Use the slider bar to determine file compression. 0 means no compression and 9 give you the maximum possible compression. The higher the compression level is set, the more slowly files are written. The higher the compression, the smaller the size of the exported files in hdfs will be. Also note that higher compression may also mean slower retrieval of data

How does partitioning work?

Partitioning is a process by which the export data is placed in a dynamic directory structure based on the values of certain event fields. You can choose how exported data is partitioned. It can be partitioned by any of the fields present in the event.

Hadoop Connect exposes the following out-of-the-box partitioning variables:

  • Date
  • Hour
  • Host
  • Source
  • Sourcetype

When you are creating an export job, you can select one or more of these partition variables in the user interface.

Splunk will create directories based on these partions in hdfs. So if you select Date, Hour and Sourcetype, Splunk will create the following directory structure for your data /2018/11/06/00/WinEventLog_Security/ to store the various files containing your exported data

Creating your own partitions

In addition to the out-of-the-box partitioning variables, you can use any field to compute a partitioning path into the events (results) to be exported. Use the special field _dstpath. For example, to export your search results into the path <base-path>/<date>/<hour>/<app>, use the following search string:

search ….. | eval _dstpath=strftime(_time, “%Y%m%d/%H”) + “/” + app_name

You can perform other types of preprocessing of data in the Splunk platform (lookups, field extractions, evaluate other fields, and so on) or choose to export it in raw format

Schedule Export
The Schedule Export is the only way to export indexed data into hdfs.

Anmar0293
Path Finder

Thank you, @sduff

0 Karma
Get Updates on the Splunk Community!

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...