I have a set of 200Gb of compressed data to be added to Splunk from a dissferent server.
I managed to copy them on to the Splunk server and where now they resides.
It is consists of thousands of files and subfolders.
Could you please let me know how do I add these files to Splunk at one go?
I wouldn't mind even if it si few sets if I can break it down to lets say 500Gb each.
If I go ad file to splunk then it allows to select only one single file ata time.
I tried to zipped the full 200Gb but failed and I thought that's not an inteligent way to do this.
Please shed some light on this and appreciated.
First, you need to consider the size of your Splunk license. The free license only allows you to index 500MB per day. However, you could still potentially load all 200Gb - if your server can load it all in a single day or two. If you exceed your 500MB limit three times in 30 days, Splunk will lock its search function.
Second, you can ask Splunk to monitor a directory - which will load in all the files and subfolders. This will work better than uploading a single file. After everything is loaded, you can delete or disable the input in Splunk and remove the directory.
For a production environment, you should install the Splunk Universal Forwarder on the "different server." The Universal Forwarder would monitor the directory and forward the data to the Splunk server. This would allow you to continue to collect the data over time.
Compressing the files into a single zip will not help. Splunk must unzip the file in order to index it, so it doesn't save you anything.
If you ask Splunk to monitor the directory, it tracks which files are uploaded. While files are being indexed, Splunk tracks its current progress in each file, so that it will start where it left off in case of interruption. So you don't need to worry about overwrites or duplication.
There is one other alternative to monitor - and that is a sinkhole. You can't do this via the Splunk GUI, but you can tell Splunk that there is an upload directory. This is also called a batch input. When you move a file into this directory, Splunk indexes the file and then deletes it.
[batch://YOURPATHHERE] move_policy = sinkhole host=HHHH followSymlink = false
YOURPATHHERE is the absolute path of the "sinkhole" directory. On Linux, this means that there will be three slashes in a row - two for the
batch:// and one for the beginning of the path
HHHH is the host name that you want to give the data in Splunk.
Another good thing about this technique is that you can upload a few files per day, in order to stay under your license limit. Each day, just move the files that you want to index into the directory, and Splunk will upload and delete them.
I had a look at the input.conf and it looks bit complex honestly. I would still like to go ahead with the monitor option. Since this is large amount of data i do not know when is the start and end of these files. So if i start to monitor this and how do I know it has successfully completed the indexing of full 200Gb?
Also I have a seperate index created to include these archived data. So Is it possible for me to send the monitor data to that index instead of default "main" ? if so how?
I am trying this all morning now. Even if I go to Add Dat and new input to monitor the folder it's not allowing me to select the folder and only allows me to select a single file inside the folder.
Bit confused about how to get this folder monitored and data indexed using the GUI.
Go to Manager » Data inputs » Files & directories
Click the New button. Click Skip Preview and Continue
Select the first option: "Continuously index data from a file or directory this Splunk instance can access"
In the box beneath "Full path to your data", enter the path starting from the drive. For example:
Click More Settings
Under Index, use the dropdown to select the specific index where this data should go
Splunk can normally index about 100 GB per day, depending on your configuration. But you can always search the data that has already been indexed - that will give you some indication of how far along Splunk has gotten. Just do a search. You may have to use
index=yourindexname in your search
Also, there is a log file - splunkd.log - that will contain any error messages if Splunk has problems indexing.
About the sinkhole directory, you mentioned that in Linux it will be 3 slashes. I am using Windows how should it look. ex:if my log files are in C:\Windows\Log can I use the path as batch=c:\Windows\Log as the sinkhole directory path? is it correct?
Also Can i append the above sinchole script you mentioned to input.conf. I can see there are some other stanzaalready in the input.conf.
Also after that I have to do a full Splunk restart?