Getting Data In

Loading BOTSV1 JSON into developer Splunk environment

FCTaylor
Explorer

I am new to Splunk and need some serious practice to learn all the cool things Splunk can do. I am trying to load the BOTSV1 JSON dataset into my lab environment so I can start learning the basics of SPL. According to the comments in GitHub this dataset is 120GB uncompressed. This brings up the following two issues.

1) The Splunk web file importer will only load files up to 500MB. How am I supposed to load a 120GB file?

2) The Splunk development license that I received is limited to 10GB, so how am I supposed to load this 120GB file once question #1 is resolved?

I am sure I am not the only one encountering this issue, so forgive me for asking a question that has probably already been answered numerous time. 

Labels (2)
0 Karma
1 Solution

FCTaylor
Explorer

Not only am I new to Splunk, but I am a bit of novice at Linux. Turns out I created my Linux environment using LVM, which seem to have only used 100Gb of the 300Gb disk space I allocated. While attempting to install the Botsv1_Data_Set using the web interface I never saw the notices that I was out of disk space so the install would never compete.

When I ran the install manually using the terminal I finally saw an error message indicating the disk was out of space. Once I resolved my LVM disk space issues the app installed correctly and I was able to run the "index=botsv1 earliest=0" search and get events displayed.

Thank you Stephanie for responding to my posts. I hope this helps some other newbee to Splunk out there.

 

View solution in original post

FCTaylor
Explorer

Not only am I new to Splunk, but I am a bit of novice at Linux. Turns out I created my Linux environment using LVM, which seem to have only used 100Gb of the 300Gb disk space I allocated. While attempting to install the Botsv1_Data_Set using the web interface I never saw the notices that I was out of disk space so the install would never compete.

When I ran the install manually using the terminal I finally saw an error message indicating the disk was out of space. Once I resolved my LVM disk space issues the app installed correctly and I was able to run the "index=botsv1 earliest=0" search and get events displayed.

Thank you Stephanie for responding to my posts. I hope this helps some other newbee to Splunk out there.

 

Stefanie
Builder

The BOTs v1 dataset is 6.1GB compressed and the smaller version is only 135MB compressed. 

Where did you get the BOTs v1 data? Have you looked at https://github.com/splunk/botsv1 ?

 

You can upload your data set to your Splunk server through FTP and install it through the command line or you can try to increase the web upload limit using web.conf.

You would add a stanza like so: 

[settings]
max_upload_size = 1024

where 1024 mb = 1Gb

0 Karma

kirk_in_porto
Explorer

Stephanie,

You state that after FTP transfer, you can install the botsv1 by command line.  I have downloaded the dataset via wget and moved to the $SPLUNK_HOME/etc/apps directory where I see ALL of the other Splunk apps.

I then expand with tar -xvzf <filename> and the 'botsv1_data_set' folder is created and populated with an entire folder structure of files/data.

So the app now resides in the correct folder.  When I try to find the app in Splunk with Manage Apps, it is not populated on the list.  When I try to upload or find the app, the browse window opens up to my Windows VM host and not my linux server where Splunk is installed.

A search using index=botsv1 finds nothing as does a search using index=botsv1_data_set

0 Karma

FCTaylor
Explorer

Does it matter what version of Splunk I am running. I currently have version 8.2 and the GitHub specifically calls out version 6.5.2. I am asking because when I try to install the Botsv1_Data_Set app the server seems to hang and the application never finishes installing.

I can unzip the file to %splunk_home%/etc/apps but after doing that I see the application listed in Application Manager but the "index=botsv1 earlienst=0" command returns no results.

 

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...