When i export data through building an export and schedule, what is the point of the level to compress the file? There is a scale from 0-9
Also, what are the "Partition by: Date Hour Host Sourcetype Source" for?
Is schedule export from index the only way to export data? What are the differences between the two if there is another way? I'm trying to grab all the data and push it to Hadoop.
What are the commands supposed to be?
Thanks a lot!
All your questions are covered in the docs, http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/ExporttoHDFS
Compression Level
Use the slider bar to determine file compression. 0 means no compression and 9 give you the maximum possible compression. The higher the compression level is set, the more slowly files are written. The higher the compression, the smaller the size of the exported files in hdfs will be. Also note that higher compression may also mean slower retrieval of data
How does partitioning work?
Partitioning is a process by which the export data is placed in a dynamic directory structure based on the values of certain event fields. You can choose how exported data is partitioned. It can be partitioned by any of the fields present in the event.
Hadoop Connect exposes the following out-of-the-box partitioning variables:
When you are creating an export job, you can select one or more of these partition variables in the user interface.
Splunk will create directories based on these partions in hdfs. So if you select Date, Hour and Sourcetype, Splunk will create the following directory structure for your data /2018/11/06/00/WinEventLog_Security/
to store the various files containing your exported data
Creating your own partitions
In addition to the out-of-the-box partitioning variables, you can use any field to compute a partitioning path into the events (results) to be exported. Use the special field _dstpath
. For example, to export your search results into the path <base-path>/<date>/<hour>/<app>
, use the following search string:
search ….. | eval _dstpath=strftime(_time, “%Y%m%d/%H”) + “/” + app_name
You can perform other types of preprocessing of data in the Splunk platform (lookups, field extractions, evaluate other fields, and so on) or choose to export it in raw format
Schedule Export
The Schedule Export is the only way to export indexed data into hdfs.
All your questions are covered in the docs, http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/ExporttoHDFS
Compression Level
Use the slider bar to determine file compression. 0 means no compression and 9 give you the maximum possible compression. The higher the compression level is set, the more slowly files are written. The higher the compression, the smaller the size of the exported files in hdfs will be. Also note that higher compression may also mean slower retrieval of data
How does partitioning work?
Partitioning is a process by which the export data is placed in a dynamic directory structure based on the values of certain event fields. You can choose how exported data is partitioned. It can be partitioned by any of the fields present in the event.
Hadoop Connect exposes the following out-of-the-box partitioning variables:
When you are creating an export job, you can select one or more of these partition variables in the user interface.
Splunk will create directories based on these partions in hdfs. So if you select Date, Hour and Sourcetype, Splunk will create the following directory structure for your data /2018/11/06/00/WinEventLog_Security/
to store the various files containing your exported data
Creating your own partitions
In addition to the out-of-the-box partitioning variables, you can use any field to compute a partitioning path into the events (results) to be exported. Use the special field _dstpath
. For example, to export your search results into the path <base-path>/<date>/<hour>/<app>
, use the following search string:
search ….. | eval _dstpath=strftime(_time, “%Y%m%d/%H”) + “/” + app_name
You can perform other types of preprocessing of data in the Splunk platform (lookups, field extractions, evaluate other fields, and so on) or choose to export it in raw format
Schedule Export
The Schedule Export is the only way to export indexed data into hdfs.
Thank you, @sduff