Getting Data In

How exactly does upload file for one shot indexing work?

tanmaybalwa
Engager

I am clear of steps needed for uploading a .tar file but I have a question about how does it work. Splunk indexes the file eventually and stores it in the database which isn't easily human readable. Path to indexes can be configured in splunk settings. Knowing this, my queries are:

  1. When you upload of file of say 10 MB on the remote splunk server, where is it stored? i tried $SPLUNK_HOME/var/spool/splunk immediately after uploading the file. There was no file in it.
  2. Do we have a way to configure where the uploaded file is stored?
  3. Does the file get eventually deleted on the remote server?
  4. If logs from a different date are uploaded later which have still got the already indexed data is that repetition handled?

Thanks!

0 Karma
1 Solution

somesoni2
Revered Legend

Below are the answers to your queries

  1. When you upload a file (any size), the file is not actually getting copied to Splunk server, instead data from the file is first getting transferred via protocol decided by configured inputs and saved in a temporary binary file (in folder $SPLUNK_HOME/var/spool/splunk/) which is not human readable. Later these binary files are parsed and data is stored into indexes (see more here).
  2. Since, your file not uploaded literally and you can't read binary files anyways, it query becomes irrelevant.
  3. There will not be any impact on the actual data file by splunk. It should still remain in original place, if any other program is not doing anything with it.
  4. Splunk creates the handler for a file based on first few characters of the file content and the data already indexed so far. If an updated file is placed, it will just index the new data. This is default behavior of Splunk.

Hope this helps.

View solution in original post

somesoni2
Revered Legend

Below are the answers to your queries

  1. When you upload a file (any size), the file is not actually getting copied to Splunk server, instead data from the file is first getting transferred via protocol decided by configured inputs and saved in a temporary binary file (in folder $SPLUNK_HOME/var/spool/splunk/) which is not human readable. Later these binary files are parsed and data is stored into indexes (see more here).
  2. Since, your file not uploaded literally and you can't read binary files anyways, it query becomes irrelevant.
  3. There will not be any impact on the actual data file by splunk. It should still remain in original place, if any other program is not doing anything with it.
  4. Splunk creates the handler for a file based on first few characters of the file content and the data already indexed so far. If an updated file is placed, it will just index the new data. This is default behavior of Splunk.

Hope this helps.

View solution in original post

tanmaybalwa
Engager

Thanks for the reply! 🙂

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!